Snowflake continues to be a go-to data tool for enterprises of all sizes. Since more of those companies are looking to Acceldata to help them optimize their Snowflake environments, we figured it might be a good time to show data junkies how they can use Acceldata in conjunction with Snowflake to enjoy a huge boost in analyst productivity and data culture.
In order to fully appreciate Snowflake’s value, it’s useful to quickly review the typical enterprise’s data users and their behaviors. Acceldata recently did a study of Snowflake usage by data leaders and data practitioners; you can see the results in, The Snowflake Data Experience: How Snowflake Users Optimize Their Data with Data Observability.
One of the key values of Snowflake is that it democratizes data access. Given that Snowflake is backed by object storage, it allows organizations to treat enterprise data like a data lake with an elastic warehouse on top. Unlike other data warehouses, Snowflake does not support partitions or indexes.
Instead, Snowflake automatically divides large tables into micro-partitions, which are used to calculate statistics about the value ranges contained in each column. These statistics then determine which subsets of your data are actually needed to run queries. For most data practitioners, this shift to micro-partitions shouldn’t be an issue. In fact, many people choose to migrate to Snowflake because this approach reduces query latency.
Still, if you have partitions and indexes in your current ecosystem and are migrating to “clustering” models, you need to be able to build complete “trust” in the data. Trust in data requires not only a solid understanding of data schemas and lineage, but the ability to address seemingly minor issues like syntax errors thrown during migration (e.g., Snowflake SQL is case sensitive).
Unfortunately, traditional data quality analysis does not catch migration syntax errors. Tools like Looker can be used to spot-check inconsistencies and provide ad-hoc quality tests, but deeper and more systematic tools are often needed to create comprehensive trust in data.
This is where Acceldata into play. T vdates data from an existing warehouse, like Teradata, as the data is moved into Snowflake. In the video below, you can see how Torch samples data and provides profiling reports during the migration and reconciliation reports once it’s moved into Snowflake. This ensures data quality and identifies any lingering data quality issues.
Pairing AccelData with Snowflake provides four main benefits for enterprise data users:
The Acceldata Data Observability Cloud accelerates enterprise data migration to the cloud with its data observability service. Data reconciliation and quality impede and prevent large scale data migration to public clouds. With Acceldata Torch, enterprises can automatically reconcile large scale one-time migrations along with continuous validation of incoming mutable and immutable datasets generated from on-premise and cloud applications. As enterprises move data across multiple systems, Acceldata’s Torch service enables enterprises to upkeep data quality, enforce conformance to schemas, manage sensitive data, and prevent anomalous data drift.
Data Production errors: These are data quality issues that emanate during production of data in, for example, a banking system. Typical causes are Incorrect data entry or Insufficient technical controls to ensure data is entered into the system in the correct format .
Data Transmission errors: These are data quality errors that arise from data that is transmitted from data producer to data consumer via different hops. Some examples could be column data getting truncated due to non-standard field lengths. records (read rows of data) getting dropped due to logic written to discard non-standard data.
Data Consumption errors: This type of error occurs when it gets transmitted properly from producers to consumers, but the consumer doesn’t know how to use the data properly and applies incorrect rules. This is akin to not reading a “user guide” when you buy a new electronic device and destroying it with improper set up. Typical examples include Data Consumer assuming that the column “open date” means account open date, while the data producer may be using another field “acctopn date” as the column to produce account open date.