Optimizing Data Platform Performance with Databricks Cluster Upgrade and DevOps Automation

quotsocial-media-manager-planning-campaigns-strategic-overviewquot

Data Engineering

Client Details:

A leading global insurance provider, headquartered in the U.S., specializing in a wide range of insurance products including property, casualty, and specialty lines. With a presence in over 80 countries and a gross premium volume exceeding $9 billion, the company is recognized for its innovative approach to risk management and commitment to delivering tailored insurance solutions to its clients. 

 

Challenge:

Outdated Databricks runtimes and inconsistent configurations across environments were slowing down data pipelines and causing integration problems with PySpark, SparkSQL, and configuration files. The performance degradation directly impacted the delivery of business insights and the team’s ability to scale up. 

 

Solution:

The team undertook a cluster upgrade to Databricks Runtime 15.4 and executed a detailed performance optimization plan. 70+ pipelines were audited and refactored to ensure compatibility with the upgraded environment. YAML and JSON configuration files were aligned with newer Spark versions and modern security practices. 

Using Azure DevOps and Databricks Workflows, the team upgraded 523 data files and 7 repositories, validated all functionality post-upgrade, and ensured a seamless transition. Pipelines that were previously experiencing failures or long runtimes were stabilized and streamlined for high performance. 

 

Benefits:

Improved Performance: Significantly reduced pipeline execution time and boosted reliability.
Developer Productivity: Enabled smoother CI/CD workflows and reduced post-deployment issues.
Compatibility & Stability: Addressed known issues with older Spark versions and modernized infrastructure.
Platform Resilience: The upgraded environment is now prepared for future enhancements like ML integration and real-time analytics. 

 

Social Connect