rhondamuse.com

Maximizing Cost Efficiency in Azure Databricks Clusters

Written on

Over recent months, I have collaborated with a FinOps team to analyze and improve the expenses associated with numerous Azure Databricks Clusters. I want to share my experience optimizing a specific Databricks Cluster, resulting in a 92% cost reduction, translating to an annual savings of approximately 190K. I trust that this account will assist you in your own cost optimization efforts.

Introduction — The Significance of Data

During my tenure as a Solution Architect at Amazon Web Services (AWS), I learned the critical role of good data in making informed decisions. Before making any adjustments to our Databricks Clusters, it is essential to comprehend how to accurately calculate the costs associated with the cluster we aim to optimize, which can be quite complex (Part 1 of the article).

Next, we should engage with the cluster or application owner to ascertain their requirements (e.g., is 24/7 operation necessary, is high processing power needed, or can delays be tolerated?) and proceed to optimize the Databricks cluster (Part 2 of the article). After implementing these changes, we can review the data several days later. If the Databricks cluster operates smoothly, we can analyze the data and savings (Part 3 of the article).

1. Calculate the Total Monthly Costs for Each Cluster

To assess the various components and their costs—such as IP Address, storage, virtual networks, virtual machines, and Databricks DBU—we need accurate information. I suggest two paths for achieving this:

  1. For experienced developers, utilize the Microsoft Generate Cost Details Report API to gather all relevant data for the Azure subscription hosting our Databricks clusters.

Important considerations when using the API: - When retrieving cost details for an Azure subscription, filter out data pertinent to Databricks, removing any entries related to other services or those without an associated cost. - Adjust the results based on the Azure Subscription type (Pay-as-you-go or Enterprise), as the output varies. - The API only allows data retrieval for one month or less and must not exceed 13 months old.

  1. Employ the KopiCloud Azure Databricks Cost Analysis tool.

I developed this tool drawing from my experience in API data extraction, filtering, and management, aiming to streamline the process of interpreting and managing Databricks data obtained from Microsoft and Databricks APIs. The tool features a user-friendly interface tailored for FinOps professionals wishing to access formatted data swiftly.

The tool exports data to Excel files, allowing for further manipulation or custom report creation.

With the KopiCloud Azure Databricks Cost Analysis tool, you can generate reports without spending weeks learning how to extract data from the Microsoft and Databricks APIs. Quick access to formatted data proves invaluable.

The tool generates various reports that can be exported to Excel, including: - Daily Databricks Total Cost: lists daily Databricks expenses over a specific period and the monthly total. - Cost per Databricks Cluster: outlines all Databricks clusters and their associated costs. - Cost per Databricks Job: details all Databricks jobs and their individual costs. - Cost per Owner: summarizes the total costs of all Databricks resources per owner. - Cost per Individual Cluster: useful for daily monitoring of a particular cluster.

And much more...

Furthermore, the tool produces both raw and formatted data, storing daily and total costs in local storage or Azure Blob Storage for custom PowerBI report generation.

1.1. Analyzing Azure Subscription Raw Data via Microsoft API

By employing the Microsoft API, we can inspect the raw data, revealing various resources.

Each line displays the costs in the default Azure subscription currency. To determine the cost of the Databricks cluster, we must sum all values within a defined timeframe. As the data is mixed, we need to utilize Tags for filtering and aggregating the necessary resources.

1.2. Using Tags to Identify Resources through Microsoft API

Databricks assigns tags to each Azure Resource, enabling us to discern which Databricks Cluster, Databricks SQL Warehouse, or Databricks Job utilizes a specific resource.

For instance, here is a sample cluster tag: ClusterId: "0312-164736-cgmnqt42", ClusterName: "Investment Cluster", Creator: "[email protected]", DatabricksEnvironment: "workerenv-6505895639557211", Vendor: "Databricks", application_display_name: "Investment Cluster", application_environment: "Development", application_short_name: "ic", databricks-instance-name: "609fe7050b844039a8389930b10971e0"

Next, we should leverage the tags to extract vital information such as “ClusterID,” “ClusterName,” and “ClusterOwner.” Afterward, we can call the Databricks API to gather data on Databricks Clusters, Databricks SQL Warehouses, or Databricks Jobs, cross-referencing the information for further insights.

1.3. Utilizing KopiCloud Azure Databricks Cost Analysis Tool to View Clean Azure Data

The tool employs the Microsoft API to filter all Databricks expenses, extract data, and display relevant information for exploration, sorting, or extraction.

This information can be exported as daily and/or total costs to local Excel files or stored in Azure Storage Blob. Generated Excel files can be loaded anytime to save time, conduct additional calculations, or recreate reports.

1.4. Accessing Databricks API for Comprehensive Cluster Information

The Databricks API serves as a valuable resource for acquiring detailed information about Databricks clusters. We can utilize the Databricks Clusters List API at https://docs.databricks.com/api/azure/workspace/clusters/list to retrieve comprehensive data on a Databricks Cluster.

From the JSON response, we can extract crucial data such as: - Cluster ID and Name - Cluster Owner - Cluster Specifications (Cores and Memory) - Spark Version - Autoscale Configuration (minimum and maximum nodes)

And much more...

1.5. Using KopiCloud Azure Databricks Cost Analysis Tool for Clean Databricks Data

The tool queries the Databricks Cluster API and, if needed, the Azure Virtual Machines API to compile a list of Databricks Clusters with all pertinent information.

Here is a list of Azure Databricks clusters:

And here is the detailed expenses for a single Databricks Cluster:

2. Optimizing the Databricks Cluster

We access the Databricks Workspace, select Compute from the menu, and then choose our Databricks Cluster under the All-purpose compute option.

Upon examining the properties in the Performance section:

2.1. Understanding the Databricks Cluster Settings

A few notes regarding the Performance settings: - No Termination: The no-termination option (Terminate after 0 minutes of inactivity) means the cluster operates 24/7 until manually stopped. - Photon Acceleration: This enhances job speed but incurs higher costs and may not be ideal for clusters that run continuously.

  • Virtual Machine Size: The virtual machines in this instance were large, featuring 16 cores and 64 GB of RAM per node.
  • Spot Instances: The Databricks Cluster was not utilizing spot instances.

2.2. Implementing Minor (Yet Significant) Adjustments

After discussions with the application owner, we decided to implement a few adjustments: - Halve the instance size, reducing from 16 cores/64GB to 8 cores/32GB. - Utilize Spot instances. - Set the cluster to terminate after 30 minutes of inactivity.

3. Analyzing Data and Reviewing Costs and Savings

In this final phase, we analyze the data for January and February, comparing expenses and assessing the impact of our changes on daily and monthly costs.

3.1. January Daily Data

Let’s examine the expenses from January 1 to January 28, during which we operated the cluster 24/7 and incurred an average cost of 580€/day.

On January 29, we made a minor adjustment: we configured auto-termination for 120 minutes (2 hours), dramatically decreasing costs to an average of 250€/day. This change is reflected in the chart:

3.2. March Daily Data

Now, let’s evaluate the data from March after transitioning the cluster to Spot Instances and reducing node count.

This chart illustrates the daily expenses of the cluster:

The most notable cost reduction occurred during weekends, dropping from an average daily cost of around 580€ to just 3€.

To summarize the figures from January: prior to optimization, we were spending an average of 580€/day, equating to 17,980€/month. Following optimization, our average costs were 63.35€/day during weekdays and merely 3.33€/day on weekends, resulting in a total of 1,356 €/month.

4. Conclusion

Based on the analysis, we reduced average spending from 580€/day to 63€/day on weekdays and 3€/day on weekends. Generally, a month comprises about 21 weekdays and 9 weekend days.

Thus, in a typical 30-day month, we transitioned from expenses of approximately 17,400€/month to 1,360€/month, representing an impressive 92% reduction and estimated annual savings of 192,480€!

Thank you for reading! If you found this story helpful, please consider showing your support.

  • The KopiCloud Azure Databricks Cost Analysis tool is available for download on the KopiCloud website.
  • For assistance in reducing your Databricks Cluster costs, feel free to reach out to me on LinkedIn.
  • The post image was created using cost icons designed by Dewi Sari, available on Flaticon.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Wearable Health Monitors: The Future of Personalized Wellness

Discover how wearable health monitors are transforming personal healthcare and empowering individuals to manage their health effectively.

Uniting Nations for Climate Action: The SC1.5NCE Movement

Thirteen nations have joined the SC1.5NCE campaign to support the IPCC 1.5C Special Report and elevate climate goals ahead of COP26.

The Wealth of Thales: How a Philosopher Capitalized on Olives

Explore how Thales of Miletus turned his philosophical insights into riches through the olive oil business.

Innovative Strategies for Supporting Local Restaurants and Earning

Explore creative ways to support local restaurants while earning extra income through unique side gigs.

Cultivating Workplace Happiness: 5 Essential Strategies

Discover five impactful strategies to enhance happiness in your work life and reclaim your sense of fulfillment.

Embracing Self-Love: The Path to True Happiness

Discover the importance of self-love and how it can lead to happiness and acceptance of oneself and others.

Selena Gomez's Wondermind: A $100 Million Mindfulness Venture

Explore how Selena Gomez's mindfulness startup Wondermind, valued at $100 million, is redefining mental health through a strategic newsletter.

The Moon: A Cosmic Conundrum or Just a Distraction?

An unconventional look at the moon's relevance, challenging its necessity while humorously discussing its impact on culture and behavior.