How to Use Query Cost Attribution to Optimize Snowflake Data Cloud Cost

Vinoo Ganesh

Making any kind of informed business decision requires a solid understanding of the data. Whether you’re creating a new business initiative, developing a new product line, or understanding the current financials, having visibility into the underlying data is key. In the data cloud space, that generally means visibility into how efficiently your data cloud is performing from both a workload and cost perspective. As such, in any prospect meeting, our first question is always, “what kind of visibility do you have into your workload?” We’re particularly curious about visibility across key dimensions such as individual users and business units.

Unfortunately, the most common response we hear is, “we don’t.” This is unsurprising given the speed at which users ramp up their use of data cloud. Cost management and visibility are generally considered lower priority than generating business insights and value. Furthermore, the metrics provided by tools such as Snowflake (e.g. from query history table) are difficult to understand and operationalize. This combination makes attributing Snowflake data cloud cost a somewhat challenging and complex problem.

Let’s talk a bit about pricing. Snowflake’s pricing model generally relies on users paying for the uptime of their virtual warehouses. When a virtual warehouse is up, users are charged at a consistent rate, regardless of whether the warehouse has queries actively running or whether the warehouse sits idle. The longer the warehouse is up, the more query credits are consumed.

The Cost of A Query

Optimizing data cloud spend starts with understanding the components of that spend. On this journey, the most rational place to start is the queries themselves.  

SQL queries are powerful tools in the toolkit of data practitioners. Their flexibility and expressive nature, however, is also their downside. Queries that generate the same results can be written in a variety of ways - some significantly more performant than others. Scoring and providing additional cost insight into queries is a powerful tool in understanding the effectiveness of your queries.

Query cost attribution also enables you to calculate both aggregated Snowflake usage and cost per user and business unit. This provides both visibility and accountability on the organization and individual level so you can properly forecast and set reasonable quotas for future usage.

When it comes to cost optimization, rallying and making the organization committed to the cause is half of the battle. Some of our users refer to such visibility to cost efficiencies as the “Wall of Shame.” At Bluesky we like to call it “Workload of Significance.” :-)

We believe that query cost attribution and cost optimization efforts should be done in an ongoing and automated manner instead of one-off clean up exercises in finding and killing bad queries, so companies can achieve long term, sustainable growth.

Tackling The Challenges of “Pricing” Queries

Despite all of the aforementioned benefits, query cost attribution is a nontrivial challenge that  requires specific domain expertise. Let’s look at an example.

Let’s assume a situation where 3 queries run concurrently in the same warehouse. All we can see is the aggregate utilization. Now, let’s assume one of the queries is written in a non-optimal way. It would be incredibly difficult to surface this inefficiency without having the specialized tools to determine cost on a per-query basis. A naive scheme is to “explode” each query execution record into a set of per-second records (since Snowflake charging is at the granularity level of seconds), so that the per-second cost can then get aggregated, but this computation itself can be expensive

Query cost attribution is the first step in getting your data cloud under control. We’d love to hear about your experience and help you along this journey.  Alternatively, join other Snowflake users today for a free trial of , a SaaS product that finds areas of cost optimization for your Snowflake workload.