December 18, 2016
Solving data warehouse problems with Cortana Analytics
Adrian Simpson, Theta Analytics Practice Lead, recently attended the first ever Cortana Analytics Workshop at Microsoft HQ in Redmond. At the workshop he heard from some of the Microsoft developers working on the various components of the Cortana Analytics suite, and had the opportunity to get some hands on experience too.
The Cortana Analytics suite is made up of different components in Azure, allowing you to custom build an analytical application to suit the many differing business requirements out there. Jim Taylor gave an overview in a recent post, focusing on predictive analytics and perceptual APIs. In this post I will focus on the new Azure SQL Data Warehouse and how traditional data warehousing problems can be overcome, opening up analytics to organisations of all sizes.
Problem #1: sizing and setup
Traditionally when developing a new data warehouse one of the first things to do is size and commission the hardware. Sizing a data warehouse for both storage and processing can be difficult as you only know what you know now about your source data and predict the rest. Additionally, hardware purchasing, housing and configuration is expensive and sometimes prohibitive.
Solution: cloud deployment, pay for what you use
Implementing an fully managed Azure SQL Data Warehouse requires no large upfront cost as it is deployed in the cloud in minutes and you pay for what you use. If it is not running then you don’t pay for it. The cost of the SQL Data Warehouse compute depends on the amount of Data Warehouse Units (DWU) available to it. You can start small (100 DWU) and as data volumes increase the compute can increase, or when additional compute is required, for example, during ETL processing, then additional DWUs can be added and then scaled back after the ETL is complete. This provides a fast, simple and elastic way of manging your data warehouse processing needs.
One of the features of Azure is that you pay for your storage and compute separately. This means you pay for the storage you need and the required compute which reduces cost and increases flexibility.
Problem #2: so much data
In today's connected world there are many more data sources available to organisations, such as machinery sensors, mobile devices, wearables, web logs, etc. The Internet of Things (IoT) now allows us to collect and exchange data between systems, providing a rich set of data for analysis. Some of this data is unstructured, in siloes, making it difficult to analyse in its raw form let alone integrating it to gain valuable insights.
Solution: PolyBase to query and join data
Azure SQL Data Warehouse provides PolyBase to query and join data from multiple sources using the familiar TSQL. This opens up the potential to integrate any data with traditional structured sources. This unstructured/semi-structured data can be accessed from other Cortana Analytics components such as an HDInsight (Hadoop) instance, where data is loaded from an Azure Data Factory (ADF) pipeline or data streaming in from Event Hubs and Stream Analytics.
Problem #3: looking backwards only
A data warehouse usually contains a lot of historic data and users get to analyse what has happened in the past - which helps with some analysis - but forward looking predictive and prescriptive analytics are often missing.
Solution: Azure Machine Learning
Once data is integrated in the SQL Data Warehouse, Cortana Analytics provides Machine Learning (ML) that can be used to develop models. ML provides a fully managed service to create predictive analytical solutions. ML gives you the ability to create your own models or use an existing model available in the Cortana Analytics Gallery.
Data in SQL Data Warehouse can be used as a source for the ML model and further data analysis and cleansing can be applied to prepare the data for the model. A range of statistical algorithms can be applied to data sets to train and provide predictive results. These results can be saved back into the SQL Data Warehouse or the model can be exposed via an API.
See also: Jim Taylor's series of posts on Azure Machine Learning, and his first look at Cortana Analytics.
Problem #4: Visualising results
The ability to easily visualise data, quickly modify views and interrogate data can be difficult and time consuming due to the difficulty and effort required to update reports and dashboards. This often requires raising change requests and involving IT.
Solution: Power BI
Power BI provides a seamless way to analyse data by providing SQL Data Warehouse Direct Connect as well as other connectors for familiar data sources.
With Power BI users can create rich interactive reports and real-time dashboards that can be shared in seconds. It comes with a rich set of visualisations or if you have your own then custom visualisations can be used for a personalised reporting experience.
Content packs are an excellent way to package pre-built reports, dashboards, Excel workbooks and datasets to share. These packs can be developed internally for your organisation or you can use one of the many content packs available on the Power BI gallery. Content packs can be developed for each department in an organisation, for example, giving users an easy way to access the reporting and analysis they need.
Cortana Analytics provides a suite of managed services with the capability to ingest data from a wide range of different sources, store and provide compute over large volumes of data and integrate predictive modelling and data visualisations easily. Organisations need to deal with just one vendor, Microsoft, simplifying access to data analytics.
Azure SQL data warehouse is now a real alternative for organisations over standing up a traditional on-premise data warehouse and analytical solution. The speed and cost in standing up an Azure SQL Data Warehouse instance and the other components cannot be ignored.
Organisations can quickly move beyond “what happened?” and “why did it happen?” analysis, to predictive (“what will happen?”), to prescriptive (“what should happen?”) analysis with the seamless integration of ML and Power BI.
Contact us to find out more about maximising your organisation's data assets and moving into the predictive and prescriptive world of analytics.
My next post will look at how organisations can use Azure Data Catalog to help users manage and quickly discover data assets.