Global Information Provider Partners with Acceldata to Scale and Automate Data Reliability for Multi-Petabyte Data Environment
A global provider of market, risk, and economic information and analysis about specific companies and industries, which is provided to more than 90% of the Fortune 500.
With more than 6,000 employees and $2 billion in revenue, the company maintains a massive repository of comprehensive information on 500 million organizations around the world, which it provides to its clients in almost all industries.
- Multi-petabyte database
- 500 million+ records in AWS EMR
- Continuous processing/checks for 500 billion rows
- Data checks based on 200+ rules
Data is the foundation of this company’s business. To protect and manage it, the company had long used software from a legacy data integration vendor.
However, the company was growing and modernizing, and their legacy solution could not keep up.
For instance, it had no global view into or control over data quality, as it had to perform its data quality checks for each individual application.
The solution’s data cataloging features were also not up to snuff. It was also having trouble scaling and providing automation features that the company needed.
The company chose Acceldata to provide enterprise data observability from the cloud. In particular, it valued Acceldata’s ability to:
- Catalog and create metadata for all data sources and data files
- Write all data quality and data validation rules using SQL, Python or Spark
- Run data quality and data validation rules on both static and streaming data
- Detect schema, and attribute-based changes
- Track all rejected or bad records and send alerts/reports based on data
- Handle reference data management requirements
- Handle large data volumes, such as data files with 100+ million records
- Facilitated compliance with FTC guidelines for data quality reporting
- Increased customer satisfaction due to greater data consistency & higher reliability
- Reduced SEV1 data issues to zero
- Improved engineering outcomes with data quality automation
- With their previous vendor, rules processing took 30-44 minutes per rule and 7 days total to process all the rules. With Acceldata, it takes 33 minutes to process all 2,000 rules on the data.
- Before Acceldata, data reliability rules processing took 15 days to cover less than 100% of rules. With Acceldata, it takes 2-4 hours to cover 100% of rules.
Multi-Layer Data Observability
Enterprises are frequently challenged with managing and optimizing complex, large-scale data environments.
Multi-layer data observability correlates information across infrastructure, platform, processing and data layers to identify and alert on trouble spots, bottlenecks, and inefficiencies.
Analytics and recommendations simplify remediation and administration. In addition, Acceldata provides an extensible library of auto-actions to make systems self-healing and self-tuning. The right data observability tools can significantly improve the reliability, performance, scale, and cost of enterprise data environments.
The Acceldata Solution
Acceldata delivers improved reliability, performance, and efficiency of data processing at scale.
- Predict & Prevent Incidents
- Scale Performance
- Reduce Infrastructure Costs