;

Why the Electricity Authority moved to a modern, cloud based data platform

Microsoftteams Image (246)
The Electricity Authority (the Authority) promotes competition in, reliable supply by, and the efficient operation of the New Zealand electricity industry for the long-term benefit of consumers.

A key part of this role is maintaining a history of the wholesale electricity market, which started in October 1996. The industry was self-regulated initially. The Electricity Commission was then established in 2003 and replaced with the Electricity Authority in 2010.

Legacy data warehouse

The Authority was running on an on-premises SQL Server-based data warehouse, which showed its age – it lacked scalability and flexibility. Electricity market participants had already begun modernising their systems, and the Authority needed to upgrade to maintain visibility across data. They struggled to process vast amounts of data and risked not fulfilling their role as an efficient and reliable regulator of the electricity market.

Dealing with large data sets

Every half hour, the Authority refreshes its generation and price history data. Market participants then use this data for analysis and forecasting. Much of the data is republished as the Authority receives more accurate data from market participants, meaning the data sets are both large, and there are multiple versions to be managed. For the Authority, this sits at around 5 Terabytes collected over the past 15 years.

Upgrading to a scalable, cloud solution

The Authority wanted to upgrade to a modern, cloud-based data platform and approached Theta to run multiple proofs of concept to determine the best platform.

For the regulator of the New Zealand electricity market, a highly stable platform was essential. We needed to make sure that it:

  • Provided flexibility without sacrificing performance or functionality.
  • Allowed data engineering teams to integrate with their clients.
  • Moved away from legacy data sharing methods, e.g. files.
  • Increased adoption of API integration for data.

We recommended Microsoft’s PaaS service Azure Databricks as the most appropriate and cost-effective option to upgrade the existing data warehouse into a modern, scalable, cloud-based data platform.

Azure Databricks now unifies the Authority's data engineering, data analytics, and data science requirements. It enables them to further enhance their capability to analyse billions of rows of data to unlock insights across the electricity sector and its performance.

New platform benefits

At 5 Terabytes - one of the largest Lakehouses in New Zealand - the platform enhances productivity, manages costs, and creates a more efficient infrastructure. With data volume and velocity increasing all the time based on the rate of data generation by electricity suppliers, the Authority can quickly grow and adapt as needed.

Near real-time insights allows them to access insights faster.

The Authority's data engineering and the data science teams for Market Monitoring can work collaboratively with a new integrated data platform. Visibility for both teams has improved and siloes have been removed.

They have experienced a significantly reduced data processing time - from hours to minutes – thanks to the elasticity and scalability of Azure and Databricks.

Theta also highlighted areas where the Authority could optimise Azure costs, helping them get the most from their new setup. In the long term, there will be cost savings.

Technical

The new solution is built on Azure Databricks: a scalable data integration and data analysis platform.

Support is offered in multiple programming languages:

  • Python and Scala for data engineering workloads.
  • Python and R for machine learning and predictive analytics through Databricks MLFlow.
  • And a familiar SQL based query language for data analysts who need to interrogate data in the platform.

At the core of the implementation is Delta Lake, which offers the Lakehouse (data warehousing across a data lake). This functionality provides scalability for the processed data into a multi-tiered implementation.