Building The Data Warehouse-less Data Warehouse (Part 2 of 2)

Part 1 of this series can be found here.

In my previous post I discussed and explored the feasibility of building a simplified reporting platform in Microsoft Azure that did away with the need for a relational data warehouse. In this article, I proposed that we land, process and present curated datasets (both dimensional files for our data warehouse “layer” and other data assets for our data scientists to work with) within Azure Data Lake Store, with the final step being to product a series of dimension and fact files to be consumed by our semantic layer. The diagram below highlights this approach:


At the end of the previous post I’d produced our data warehouse files (dimensions and facts) and in this second and final part I will show how we consume these files with our semantic layer (Azure Analysis Services) to finally present a business-friendly reporting layer without a data warehouse in size.

Semantic Layer

A semantic layer presents the business view of the data model, and allows users to interact with the layer without needing knowledge of the underlying schema or even knowledge of writing SQL code. In the words of my colleague Christian Wade, it’s “clicky-clicky draggy droppy” reporting that provides a single version of the truth without risk of users creating inaccurate measures by a misplaced join or incorrect Group By statement.

Microsoft’s semantic layer offering is Azure Analysis Services, and allows users to connect to models built with Analysis Services using any compliant tool, such as Power BI and Tableau.

I create an Azure Analysis Services project in Visual Studio, and connect to my Azure Data Lake Store from Part 1 (Ensure you change your model to use the latest SQL Server 2017/Azure Analysis Services Compatibility Level) :


Azure Analysis Services Get Data

In the Query Editor I create queries that pull in the csv files that I created earlier for DimProduct, DimCustomer and FactSales:


Query Editor

Note, whilst it’s relatively straight forward to import csv files into Azure Analysis Services from Data Lake Store, my colleague Kay Unkroth wrote a great article that makes this much easier, and I use this method in my solution. Please see this article for further details.

Once the tables have been imported into Azure Analysis Services, it’s then a simple feat to define our star schema and create a couple of measures:


Simple Star Schema (No Date Dimension Yet!)

We then publish our Analysis Services model to the Azure Analysis Services Server we created in part 1, and connect to it using Power BI:


Power BI Example

That’s it, all done!

Not quite…

Refresh and Orchestration

So we’ve shown now that you can ingest, process and serve data as dimensional constructs using Databricks, Data Lake Store and Analysis Services. However this isn’t at all useful if the pattern can’t be repeated on a schedule. From Part 1, we use Azure Data Factory to copy data from our sources and also to call our Databricks notebook that does the bulk of the processing. With our Analysis Services model now published, we simply need to extend our Data Factory pipeline to automate processing the model.

Logic Apps

There are a few methods out there for refreshing an Azure Analysis Services cube, including this one here. However I particularly like the use of Azure Logic Apps for a code-lite approach to orchestration. Using Logic Apps I can call the Azure Analysis Services API on demand to process the model (refresh with latest data from our data store). The Logic App presents a URI that I can then call a POST against that triggers the processing.

Jorg Klein did an excellent post on this subject here, and it’s his method I use in the following example:


Logic App Example

Once you’ve verified that the Logic App can call the Azure Analysis Services refresh API successfully, you simply need to embed it into the Data Factory workflow. This is simply a matter of using the Data Factory “Web” activity that is used to call the URI obtained from the Logic App you created above:


Logic App Post URL

Our final (simplified for this blog post) Data Factory looks like this, with the Web Activity highlighted.


A simple test of the Data Factory pipeline verifies that all is working.


So, there you have it. My aim in this post was to see if we could create a simplified data warehouse-like approach that did away with a relational data warehouse platform yet still provided the ability to serve the various workloads of a modern data platform. By keeping the data all in one location (our data lake), we minimize the amount of data movement, thus simplifying many aspects, including governance and architecture complexity.

In terms of how we did it:

  1. Ingested data from source systems using Azure Data Factory, landing these as CSV files in Azure Data Lake Store
  2. Azure Databricks was then used to process the data and create our dimensional model, writing back the data files into Azure Data Lake Store
  3. Azure Analysis Services ingested the the dimensional files into its in-memory engine, presenting a user friendly view that can be consumed by BI tools
  4. Refresh of the Analysis Services model was achieved using Azure Logic Apps, with this component being added to our data pipeline in Azure Data Factory

Is This A Viable Approach?

Simply put, I believe the approach can work, however I think it is definitely dependent on specific scenarios. You can’t, or at least, not very easily, create “traditional” data warehouse elements such as Slowly Changing Dimensions in this approach. The example proposed in these articles is a simple star schema model, with a “rebuilt-every-load” approach being taken as our data sets are very small. For large, enterprise scale data warehouse solutions you need to work in different ways with Data Lake Store than we would do with a traditional data warehouse pipeline. There are many other factors to discuss that would affect your decision but these are out of scope for this particular article.

So, can we build a datawarehouse-less data warehouse?

Yes we can.

Should we build them this way?

It depends, and it’s definitely not for everyone. But the joy of cloud is you can try things out quickly and see if they work. If they don’t, tear it down and build it a different way. One definite benefit of this particular solution is that it allows you to get started quickly for an alpha or POC. Sure you might need a proper RDBMS data warehouse further down the line, but to keep things simple get the solution up and running using an approach such as suggested in this article and “back fill” in with a more robust pipeline once you’ve got your transformation code nailed down.

Happy building.

Further Reading

Azure Analysis Services With Azure Data Lake Store

Process Azure Analysis Services Using Logic Apps

Operationalize Databricks Notebooks Using Azure Data Factory

Building The Data Warehouse-less Data Warehouse (Part 1 of 2)

*Update – part 2 of this series is now live, and can be found here*

In times of yore, those who maketh the houses of data would bring forth vast swathes of tables and hurl them down in the forts of staging. Here the ancient priests of Ee, Tee and El would perform arcane magicks and transform these rows of chaos into purest order. This order would be rebuilt into the fabled Data Warehouses, and all who looked upon them would gasp in awe and wonder.

But that was then. Today’s data world is different, isn’t it? Data is varied. It’s big and small, fast and slow. It’s tougher to wrangle and make into shapes fit for use. But through all of this, we still need the hallowed, fabled, data warehouse.

Or do we?

This post is basically me exploring a different approach to building a data warehouse that’s based on the data lake paradigm and a form of ELT (Extract, Transform and Load), but leaves out the actual data warehouse part. I’m still going to build a star schema, but it’s going to be file based, using modern data engineering tools to do the munging and “schema-tizing” before sucking into a semantic layer for reporting. It’s also me exploring the capabilities of my current favourite tool – Azure Databricks.

What are the potential benefits of this approach? A few spring to mind, such as cost, flexibility and simplicity. By keeping all the processing within the data lake means it’s easier to control and govern, and the reduced data movement (you’re not copying into a data warehouse) makes an altogether more simpler structure.

Conversely, I’m well aware that this approach brings it’s own challenges. Traditional data warehouse constructs like Slowly Changing Dimensions, Surrogate Keys and other elements of the Kimball checklist will be harder or even not possible with this, so it won’t suit every scenario.

My aim here though was simple – can we build a BI solution without the data warehouse element and is it a viable approach for certain scenarios?

In short, the solution looks like this:


File-based Data Warehouse Conceptual Model

The above construct largely follows the traditional ETL model with data flowing in from source systems, and comprises the following:

  1. Data ingested as raw files into a staging zone in the data lake
  2. Files processed using data engineering platform into cleansed outputs
  3. Scrubbed files then shaped and moved into relevant serving zones. In this example I’ve kept it simple, with one zone for the star schema files and one for a data assets folder that provides cleansed, curated data sets to be consumed by analysts and data scientists.
  4. The star schema files are subsequently loaded into a semantic layer for enterprise reporting and ad hoc slice n’ dice functionality, whilst the asset files are consumed using the variety of tools preferred by today’s analysts.

To bring this to life, I’ll be using a handful of Azure data services for the relevant solution components –


Solution Components

In this example, I’ll be using an Azure SQL Database as my data source with the AdventureworksLT database.


For each of the above, ensure you put everything in the same Azure region.


There are plenty of options in this space that could move data from source to my lake, including ETL tools such as SSIS, Talend and Azure Data Factory. For this example I’m using Azure Data Factory (version 2), with copy activities moving data from my source SQL database and dropping as *.csv files. I’m also taking advantage of the new Databricks functionality built into Azure Data Factory that allows me to call a Databricks Notebook as part of the data pipeline.


Data Factory Copy Activity

The above data factory copies the source tables into my “raw” directory, and from there I process the files, with the end result being to create my dimension and fact files ready to be loaded into Azure Analysis Services.


Load Files Into Spark DataFrames

With this files safely copied into our raw landing zone, I can now extract the source tables directly into Spark DataFrames. An example is shown below for the product table. We create a DataFrame for each source file.


Example DataFrame Creation


With the DataFrames created, I then create temporary SQL tables from them so that we can use SQL code to define my dimension. You can of course manipulate them natively as DataFrames using Python/Scala/R if you’re more familiar with those languages. I’m a SQL dude, and am familiar in building data warehouse routines in SQL code so have opted for that method here.


Create Temporary Views From DataFrames

With these views created we can use good ‘old SQL to create DataFrames that reflect the dimensions and facts we want to load into Azure Analysis Services.


Dimension Example

and fact tables:


Fact Example

When I run my notebook now I have my dimensional and fact tables created as Spark dataframes. In order for them to be consumed by Azure Analysis Services I need to write the dataframes into CSV files that can then be imported into my tabular model.

Output Dimension and Fact Files

Writing files back from Spark into csv wasn’t as seamless as I thought it would be. Whilst the commands are fairly straight forward, as Spark is a distributed system it writes multiple files as an output, including crc and SUCCESS metadata files. It also doesn’t name them as the file you specify, but instead names it based on the partition name.

We don’t need these files, and need a proper, consistent filename, so I wrote some extra code to rename and move the files back under our Data Warehouse directory.


Writing Out To CSV

With all this done, it leaves us nicely with a set of files ready for ingestion into Azure Analysis Services, which is the subject of Part 2 of this series.


Outputted Dimension And Fact Files

Coming in the second and final part of this series…

  • Build an Azure Analysis Services model directly off the dimension and fact files within the Data Lake Store
  • String it all together using Azure Data Factory

Part 2 is now live, and can be found here….

Power BI Premium Vs Analysis Services

What? Is that even a valid question?

At first glance this isn’t an obvious topic, but I’ve had a couple of conversations recently with partners that echoed some of my own thoughts about where and how Power BI Premium sits with Analysis Services.

Let’s recap what Power BI Premium brings in terms of performance enhancements (read about its wider capabilities here):

  • Increased dataset limits
  • Real time cache/pin to memory
  • Secondary model copies
  • Geo-replicas
  • Incremental/Dedicated Refreshes

With the new support for very large data sets, incremental refreshes etc, this has led to the following question being asked:

“If I have Power BI Premium with all these enhancements, why would I need Analysis Services?”

To me, there’s one main reason above all others why you would use Analysis Services in conjunction with Power BI, and you can skip to the summary to see it, but in order to answer this, let’s first look at the two products in more detail:

Power BI

Until recently, we had Power BI Pro (or Free) and Analysis Services. Both had quite a clear, defined place in the BI ecosystem. Power BI is described as the following:

…a suite of business analytics tools to analyze data and share insights”

These tools include Power BI Desktop (a powerful desktop data mashup and reporting tool), Power BI Mobile (mobile apps allowing you to access reports from anywhere) and the Power BI Service (allows you to publish and share reports throughout the organisation).

So, Power BI allows users to ingest data from a variety of sources, mash it up, build some reports and publish them to the Power BI Service for sharing and collaboration.

Power BI Premium is an add on to Power BI Pro that brings dedicated capacity, performance and scalability (see earlier in this article).

Analysis Services

With Analysis Services, you can mashup and combine data from multiple sources, define metrics, and secure your data in a single, trusted semantic data model.

Sound familiar?

In many ways, Power BI and Analysis Services are very similar. In fact, Power BI has at its heart a Tabular Analysis Services engine. When you ingest and mash up data into Power BI Desktop, you’re actually populating a local version of Analysis Services.

In fact with both tools you have very similar features:

  • Create hierarchies
  • Measures
  • Calculated columns
  • Row Level Security
  • Live Connection/Direct Querying

Data Visualization and reporting aside (Power BI does this, Analysis Services doesn’t – it’s not a reporting tool) there are also differences, but the lines have become blurred in recent times.

The Main Difference

The key difference is that with a data model created in Power BI directly, it’s essentially a closed model to Power BI only.  Users can create pages (reports) in Power BI Desktop with multiple visualizations and then publish this workbook to the Power BI Service. Within the Service itself, users can edit the published reports on this dataset as well as adding new ones:


Edit Report In Power BI Service

The point here is that this model and its data can be reused within and by Power BI only. Users must login to Power BI to edit the report, or they can publish a Power BI Desktop file as a central model and connect to it as a Power Service Live Connection. All of this functionality is pure Power BI.

Where Analysis Services differs is that it is essentially an open model. Yes, you create a data model by ingesting data and creating a user-friendly reporting model in exactly the same way you do with Power BI, but that’s where you stop. You publish the Analysis Services model as a central, trusted data source and – and this it the crux of the matter – ANY tool that can submit DAX or MDX can use your Analysis Services model and get exactly the same answer. This is the ultimate “single version of the truth” that is the goal of so many BI projects. You create your central model, publish it, then connect to it from any of the following tools, plus many more:



In the end, Power BI Premium on its own is a compelling option if you’re just a Power BI shop (and Excel, sort of) and nothing else. Your analysts and developers can develop Power BI models in Desktop, and these models can be used to create reports in the Service, or from Power BI Desktop using the Power BI Service Live Connection. My advice in this scenario is to start with Power BI as your data models, especially if your models are small or moderate in size, then consider moving to Analysis Services as a scaling up option. This is an especially appealing option now with the release of the Analysis Services Web Designer, which allows you to publish Power BI workbooks to Analysis Services and create a data model from them. This is an ideal scenario where you’re running a PoC and want to promote it to something more “enterprise”.


Scale Up Power BI to Analysis Services


However, if you want a central data model to act as the single version of the truth and have many different tools in use across the organization, then Analysis Services is your friend:


Power BI & Other Tools With Analysis Services

In conclusion, I recommend Analysis Services as a key component for any relational data warehouse platform in combination with Power BI. Whilst the decision isn’t as clear cut if you only use Power BI, the ability to create a completely abstracted, semantic layer that serves as your golden view of data regardless of what visualization tool is used is a point that can’t be over sold.

Agree? Disagree? Please feel free to comment below.

Related Links

Power BI Premium

Azure Analysis Services

Power BI Service Live Connection

Why Analysis Services






Azure Analysis Services Web Designer

I’m a long time fan of Analysis Services, and this latest feature is a really cool addition to the offering. Currently in Preview, Azure Analysis Services Web Designer offers the following functionality that extends Azure Analysis Services, all through a simple web UI:

  • Add a new logical server in Azure
  • Create a new Analysis Services model from SQL DB, Azure SQL DW and….Power BI workbooks! (More data sources to come)
  • Browse an existing Analysis Services model and add new measures for quick validation
  • Open an existing Analysis Services model as a Visual Studio Project, in Excel, or in Power BI
  • Edit an existing Analysis Services model using TMSL (Tabular Model Scripting Language)

Example – Creating a Model from Power BI Desktop

In this example, I’ve created a simple Power BI workbook (.pbix file) that’s connecting to an Azure SQL DB instance running the sample AdventureWorks database:

Simple Report

Simple Power BI Report

To connect to the new service, I can access via the Azure portal or go direct to

AAS Web Designer Portal

Azure Analysis Services Web Designer Portal

On this occasion, I’ll use the server I already have and instead go straight to adding a new model:


Importing a Model from Power BI

Once created, I can browse the model in the Designer:


Browse Model

Or open it using Power BI, Excel or Visual Studio:

Open Options

Open Model Options

Pbi From AAS Designer

Power BI From Published Model

This is in preview right now, with many features still to come. The ability to import Power BI models into Analysis Services is a massive feature in its own right, but aside from that it already shows how you can quickly create and modify models without having to delve into Visual Studio/SQL Server Data Tools. New features are coming every month, so keep on eye on the team blog (below) to follow its progress.

Further Reading

Introducing Azure Analysis Services Web Designer

Analysis Services Team Blog