How Data Observability Reduces Snowflake Costs

A surprising number of organizations say they’ve exceeded their public cloud spend budget by 13% and that excess spending will likely increase by about 29% by the end of 2022. Naturally, one would assume that it’s because there’s just so much data to store and process, but as it turns out, that’s not the case. These organizations also say they waste about 32% of their cloud spending. So, why do businesses spend more than they need? Better yet, how do you solve for the excess in cloud spending and wastage? The answer lies in how well you practice data observability in your organization.

Data observability has become increasingly important as organizations move to cloud-based architectures and adopt microservices-based approaches to application development. These data architectures are complex and dynamic, making it difficult to understand what is happening within the system at any given moment. However, data observability has made the nice-to-have data quality frameworks and governance policies actionable. You can now confidently create systems that maintain the quality of your data across multiple teams.

Snowflake is one of the more popular SaaS platforms offering data warehousing services. Data observability can help companies to reduce their Snowflake costs by improving the efficiency of their cloud resources, forecasting how much they’ll spend, and optimizing their performance. 

Data observability for the modern data stack

Gartner defines data observability as “the ability of an organization to have a broad visibility of its data landscape and multilayer data dependencies (like data pipelines, data infrastructure, data applications) at all times with an objective to identify, control, prevent, escalate and remediate data outages rapidly within expectable SLAs.” It describes your system’s capacity to find, analyze, and correct data problems in near real time using various technologies and methods. Data observability helps you detect and identify the primary causes of data discrepancies and provides preventative steps to improve the efficiency and reliability of your data systems.

How Snowflake’s pricing works

Snowflake costs are either on demand or by pre-purchasing capacity. Paying on demand means you pay for only the services you consume. So, Snowflake calculates the total cost of the resources you provisioned for a month, and then you pay in arrears. 

When you pre-purchase capacity, you pay for a specified capacity you believe your business needs. Then, your business is free to use up that capacity per month as it wishes. This pricing offers less cost and more service options and is suitable for the long term.

More than the type of pricing, your total costs depend on the region, your resources, the Snowflake services you’re using, the plan you’re on, and the cloud provider (AWS or Azure). However, most organizations choose the pay-on-demand option to first be familiar with systems. They also do this so that they can only pay for the resources they use without being locked into a long-term contract.

Regional pricing example

Let’s look at the difference in costs when you pay on demand in two regions. For example, a customer using Azure in Zurich, Switzerland, would pay a fixed rate of $50.50 per TB per month, while another in Washington would pay $40 per TB per month on demand. 

Snowflake offers three services: storage, compute, and cloud. The costs for each service are measured by “Snowflake credits.” A Snowflake credit is a unit of measure used only when a customer consumes any Snowflake resource. 

The examples below would incur additional costs depending on the plan and services you choose. For example, the customer in Zurich would have the following:

Source: Snowflake pricing guide

And for the customer in Washington would have costs like those seen here:

Source: Snowflake pricing guide

Snowflake’s pricing guide has several examples of companies using all three of its storage, compute, and cloud services and their total costs based on the resources they consume and how often they use them.

How data observability reduces Snowflake costs

Whether you pay on demand or pre-purchase capacity, you can prevent wastage by consistently observing and optimizing your data systems using data observability tools.

Optimizes resource performance

Standard practices help optimize a business’s data resources and ensure its data quality. For instance, enabling auto-suspend for all your virtual warehouses would automatically pause the warehouses when they’re not processing queries. You could also filter out data that should not be processed so that you reduce the amount of work done. 

With data observability, businesses can receive alerts whenever they no longer follow set standards or best practices. For example, it can send notifications when you’re processing data when you should not — or not including timeouts for your requests and queries.

Data observability also helps analyze and manage bottlenecks, high data spillage, compilation time, data volume, resource allocation, and other aspects of data quality and management. It helps ensure that your resources are working as intended with low latency. It also allows you to plan for failures in your data architecture.

With Acceldata, you can track your data pipeline performance and quality inside and outside of Snowflake, thus optimizing your Snowflake costs. The Acceldata data observability platform helps companies follow Snowflake’s best practices, and when they’re in violation, Acceldata sends notifications and recommendations.

Provisioning efficiency and warehouse usage

Over-provisioning is a constant issue with Snowflake services, and it can create performance inefficiencies. Warehouse size plays a factor, because larger your warehouse is, the more Snowflake credits you’ll use to process requests — however, the larger your warehouse, the faster the response. Data observability enables you to compare the pros and cons of larger and smaller warehouses so data teams can make adjustments based on the specific needs of your enterprise.

Optimizing cloud resources usage

A typical culprit in surprisingly high cloud spend amounts is often due to lack of awareness around cloud resources usage; they are often unused or overused, and either way, you’re paying a price. You might accidentally run test code on-demand, which actually achieves very little for your business, and therefore, is not a good use of your budget. Or, more typically, you might pre-purchase a lot more capacity than you need, which you never use, yet continue to foot the bill for.

The key is to know which resources you should provision and how much it will all cost. Data observability provides the insights necessary because it checks how large your data is (volume), how often you update it (freshness), and where and how often you access it (lineage) to offer recommendations on the most efficient way to use Snowflake’s services.

With data observability, you’re also able to forecast Snowflake costs so you can review, adapt, and plan accordingly. Again, Acceldata is a great example, providing a contract plan, current and projected spend analysis, department-level tracking, budgeting, and chargeback. Acceldata also helps you avoid the effects that often accompany businesses when they switch to cloud platforms.

A solution to optimize your Snowflake costs

For data to be valuable to a business, it needs to be reliable, accessible, and trustworthy. The good news is you can pay for what you need and optimize your cloud resources to fit your budget and computing processes. Start leveraging data observability in your data systems and structures to reduce your Snowflake costs. 

Acceldata can help migrate quickly and affordably to Snowflake by providing visibility and insight to raise your data’s efficiency and reliability and reduce cost. If you need help figuring out how to begin, Acceldata offers a simple and effective integration that gets you started quickly.

Photo by Kalle Kortelainen on Unsplash