Part 1: Pricing Model and Virtual Warehouse Best Practices
Ever get surprised by the size of your Snowflake bill? Do you want to know how to optimize your workloads for higher cost efficiency? Wondering how to keep your workloads optimized? I’m writing a 3-part post series on these subjects, and this is Part 1.
Snowflake, one of the popular data cloud platforms, promises virtually unlimited compute power to your data warehouse workloads, allowing you to profit from data analytics faster. Gone are the times of resource under or over provisioning, as you pay for what you use.
However, with unlimited compute comes unlimited cost. If care is not taken, you may find yourself faced with an obscenely high bill at the end of a month or a contractual year, wondering what you signed up for. Without assistance, this uncertainty can escalate into regret and frustration, and you start fantasizing about “repatriating” into on-premise data warehouses. Even worse, cost concerns hamper your ability to execute, undermining why you went to Snowflake in the first place.
Fear not - There are best practices and tools to help you manage and grow your data cloud investment responsibly. We’ve helped dozens of companies succeed in doing so and witnessed the benefits. In this post we share some high level knowledge in Snowflake’s pricing model when it comes to the compute/query workloads, and some tips in attaining high cost efficiency.
Pricing model and best practices
Snowflake features usage based pricing, but you don’t get charged on each query you run. Instead, you set up virtual warehouses, and pay for their uptime, with a unit price based on their sizes. Warehouses are what Snowflake calls compute clusters. You need to decide which warehouse to send your queries to.
As such, here are a few important guidelines in staying cost efficient:
- Shut down the warehouses when you are not using them. Snowflake offers the AutoSuspend feature, to automatically shut down unused warehouses after a while, but you might want to tighten that parameter beyond the default setting to save more money. However, you need to balance reducing warehouse uptime with availability. An overly aggressive auto-suspend policy may result in warehouses shutting down right before a query arrives. Executing the query will require resuming the warehouse, only now with a cold cache, hurting cost efficiency and impacting user experience.Finding the right balance is easier with proper monitoring of your warehouse and analysis of your workloads.SaaS tools like Bluesky automatically monitor and optimize your warehouse settings including AutoSuspend.
- Don’t misuse larger warehouses. Without guardrails set up by tooling or policy, your Snowflake users might be incentivized to send their queries to the largest warehouses that they can have their hands on – after all, who doesn’t like their queries to finish sooner? But sending small queries to large warehouses can be incredibly wasteful – a $0.10 query could end up costing $10 if care is not taken. On the other hand, figuring out the optimal warehouse size for a query could take non-trivial expertise. Stay tuned for future blog posts, or contact us if you’d like to learn more about this subject.
- Avoid run-away queries. Back in the old days with on-prem data warehouses, a user might throw a long running query onto the cluster 5 minutes before they get off work – let it run over the night and see if the result can be ready by the next morning. The nightly workloads tend to be lighter than day time for such large queries, and the company is paying electricity to keep the warehouse up and running anyway. That’s a perfectly legitimate and effective strategy. However, with data clouds like Snowflake, remember you are paying for what you use. A long running query that’s not well thought out could end up timing out after 48 hours (the default STMT_TIMEOUT threshold), causing the company anywhere betweens $10’s of dollars (a nice meal) and $1000’s of dollars (a nice meal at a Michelin 3-star) – Ouch! Right? An easy way to reduce the negative impact here is to tune down the STMT_TIMEOUT parameter value. But what’s the right value? We don’t want to time out valuable queries right before they finish. Workload analysis can help us find that right value. Additionally, more advanced and thorough treatment involves training the data cloud users to unlearn their past “best practices” for on-prem data warehouses, and more fully embrace data cloud.
Please feel free to share this post on Twitter or LinkedIn if you find this useful. Please connect with me (LinkedIn,Twitter) if you’d like to chat more about data cloud cost efficiency.