Migrating Your SQL Server Estate To Azure

Many customers want to move their SQL Server databases into the Azure cloud. Whilst not discussed here, moving to a PaaS (Platform As A Service) database platform provides many benefits over you having to manage your on premise or IaaS (Infrastructure As A Service) SQL Server databases.

Whilst many want to, the actual process isn’t traditionally easy. There are many steps to go through, including identifying what you need to migrate, assessing compatibility with newer SQL versions, making any necessary changes, and that’s even before you attempt the migration itself! I’ve had several DBA roles in the past and the challenge is all too real.

Thankfully, there’s a handful of tools in the Azure space that can make this whole process a lot easier. In this blog I’ll touch on each of these – what to use and when, and will also provide a high level process that links them all together.

A lot of the information I’m going to talk about can be found in the excellent Azure Data Migration Guide, and is recommended reading for any migration to Azure. It also covers other sources such as MySQL and PostgreSQL as well as good old SQL Server.

migration process

Data Migration Process (Azure Data Migration Guide)

Discover

In this phase you are looking to identify and inventorise your SQL Server estate so that you know what you actually have running. The MAP toolkit is a great free tool to get started, and will give a breakdown of the SQL Server Editions and the features in use.

MAP Summary

Sample MAP Summary (Data Migration Guide)

 

Assess

Now we’ve documented our estate and know what we need to migrate. We need to assess these source databases and see if there are any breaking or deprecated features that would need changing before attempting to move to the latest version of SQL.

In our kit bag this time we have the Database Migration Assistant (DMA).  Newly updated (October 2018) to include Azure SQL Managed Instance as a target, you simply point DMA at your source database, choose “Assessment” and it will spit out any issues with SQL Server feature parity or compatibility that you need to be aware of.

DMA

Database Migration Assistant

Migrate

Okay, so we’ve ran our assessment using the DMA. We fixed or amended any potential blockers to moving our source database into our target SQL database. We’re now ready to move onto the final part of the migration, the actual move itself.

There are several methods of doing this (depending on your source and target), from the good old back up, copy and restore through to more advanced methods. You can also use the DMA itself to do a migration, however the DMA is more recommended for singleton migrations or for testing before the real thing.

For production scale migrations we recommend the use of the Azure Database Migration Service (DMS), which is a fully managed database migration service and allows us to not only migrate databases from SQL Server in bulk, but also let’s us do these migrations online, meaning that, following the restore to the target database, the DMS establishes a continuous sync between source and target. This feature means that you don’t have to do the cut off straight away if you’re not ready, and make that move further down the line whilst minimizing the operational impact.

dms-create-project3

Example DMS Migration Project

Summary

In this article I’ve barely skimmed the surface on some of the processes and tools required to migrate your database into Azure SQL. However, with the Database Migration Guide, coupled with the Database Migration Assistant and the Database Migration Service, you’ve now got an excellent kit bag to make that challenging migration just a little bit easier (and less stressful!).

Useful Reading/Links

Azure Data Migration Guide

MAP Toolkit

Database Migration Assistant

Database Migration Service

Choosing Your Database Migration Path (Whitepaper)

 

 

Loading Multiple Tables With Azure Data Factory

There arn’t many articles out there that discuss Azure Data Factory design patterns. One big concern I’ve encountered with customers is that there appears to be a requirement to create multiple pipelines/activities for every table you need to copy. This conjures up images of massive, convoluted data factories that are a nightmare to manage.

Whilst the ForEach functionality has existed in ADF V2 for a while now, it’s recently been enhanced to enable mass copying of tables without having to specify a target Linked Services and datasets for each table.

In this simple post I’m going to show you how to get started with this new functionality.

Using the Copy Data tool (and the Pipeline Editor), you can specify your target data source, then select multiple source tables. When you specify the target data source, Data Factory generates a ForEach activity that in turn calls a Copy activity that’s parameterized for each source table.

To get started we need to have an Azure Data Factory created, along with a Source and Target. In this example, I’ve used the Azure SQL Database with the sample AdventureWorks database and Azure Blob Storage as my target. See the following for assistance in getting setup –

Create A Data Factory

Create An Azure SQL Database

Create An Azure Blob Storage Account

With that all done, we launch our newly created Data Factory from the Azure portal, and select the Copy Data wizard –

 

ADFCopyData1.png

Copy Data

Which takes us to our Copy Data wizard. Fill in the the Task name and leave the rest as is.

adfcopy2.PNG

Give The Pipeline A Name

Click Next, and click “Create New Connection” to create our Source Linked Service –

adfcopy3createnewconnection

Choose A Source Data Store

For this example, choose “Azure SQL Database” –

adfcopy4choosesqldb

Linked Services

With your Linked Service chosen, populate your source details (Server, Credentials etc) from the SQL Database you created earlier and click “Continue”. This takes you to the “Select Tables” screen, and select all the tables you wish to copy –

adfcopy6choosetables

Choose Tables For Extraction

Define your destination data store in the same way as you created the source data store. In this example I’m using Azure Blob Storage as part of an ELT (Extract, Load & Transform) pipeline, and is called “staging” in my example. It’s my storage account which will act as the landing/staging area for incoming data.

adfcopy7stroagetarget

Destination Data Store

In your Destination Linked Service, complete the necessary details as prompted. As I’m using Blob Storage I simply have to designate the path, which in this instance is my “raw” container within my landing area. If it was a database source this would ask for me to map source tables to target tables.

adfcopy8blobrawfolder

Output Destination (This Screen Will Vary Depending On Your Destination)

Finish the wizard and run the completed pipeline –adfcopy11finalscreen

Once completed, we can see in our blob storage landing area that all my files have been successfully copied.

adfcopy15storageview

So far, so amazing, right? The even better bit is when we go to view the Data Factory pipeline we created, and instead of seeing loads of Copy activities, we simply see one, innocuous ForEach activity –

adfcopy12inadfForEach

Just One Activity – What Witchcraft Is This?

Expanding “Activities” shows us there’s just been one “Copy” activity created, and is in fact parameterized with the relevant values (source table, target destination etc) being passed as a JSON array.

 

adfcopy16jsonarray

ForEach Activity Parameters

Which then instructs our Copy activity what to do –

adfcopy14CopyhActivity

Copy Activity Source

 

And we’re done. Much simpleness. For those of you have gone through the pain of creating endless packages in SSIS for simple copy/1:1 data flows, and have previously been creating similar amounts of Copy activities in Data Factory, hopefully this new release will make your ETL/ELT development a much simpler process.

Happy Factory-ing!

 

 

 

 

 

Building The Data Warehouse-less Data Warehouse (Part 2 of 2)

Part 1 of this series can be found here.

In my previous post I discussed and explored the feasibility of building a simplified reporting platform in Microsoft Azure that did away with the need for a relational data warehouse. In this article, I proposed that we land, process and present curated datasets (both dimensional files for our data warehouse “layer” and other data assets for our data scientists to work with) within Azure Data Lake Store, with the final step being to product a series of dimension and fact files to be consumed by our semantic layer. The diagram below highlights this approach:

conceptualmodeldwlessdw

At the end of the previous post I’d produced our data warehouse files (dimensions and facts) and in this second and final part I will show how we consume these files with our semantic layer (Azure Analysis Services) to finally present a business-friendly reporting layer without a data warehouse in size.

Semantic Layer

A semantic layer presents the business view of the data model, and allows users to interact with the layer without needing knowledge of the underlying schema or even knowledge of writing SQL code. In the words of my colleague Christian Wade, it’s “clicky-clicky draggy droppy” reporting that provides a single version of the truth without risk of users creating inaccurate measures by a misplaced join or incorrect Group By statement.

Microsoft’s semantic layer offering is Azure Analysis Services, and allows users to connect to models built with Analysis Services using any compliant tool, such as Power BI and Tableau.

I create an Azure Analysis Services project in Visual Studio, and connect to my Azure Data Lake Store from Part 1 (Ensure you change your model to use the latest SQL Server 2017/Azure Analysis Services Compatibility Level) :

aasdatalakestore

Azure Analysis Services Get Data

In the Query Editor I create queries that pull in the csv files that I created earlier for DimProduct, DimCustomer and FactSales:

aasimport

Query Editor

Note, whilst it’s relatively straight forward to import csv files into Azure Analysis Services from Data Lake Store, my colleague Kay Unkroth wrote a great article that makes this much easier, and I use this method in my solution. Please see this article for further details.

Once the tables have been imported into Azure Analysis Services, it’s then a simple feat to define our star schema and create a couple of measures:

simpleschema

Simple Star Schema (No Date Dimension Yet!)

We then publish our Analysis Services model to the Azure Analysis Services Server we created in part 1, and connect to it using Power BI:

powerbiexample

Power BI Example

That’s it, all done!

Not quite…

Refresh and Orchestration

So we’ve shown now that you can ingest, process and serve data as dimensional constructs using Databricks, Data Lake Store and Analysis Services. However this isn’t at all useful if the pattern can’t be repeated on a schedule. From Part 1, we use Azure Data Factory to copy data from our sources and also to call our Databricks notebook that does the bulk of the processing. With our Analysis Services model now published, we simply need to extend our Data Factory pipeline to automate processing the model.

Logic Apps

There are a few methods out there for refreshing an Azure Analysis Services cube, including this one here. However I particularly like the use of Azure Logic Apps for a code-lite approach to orchestration. Using Logic Apps I can call the Azure Analysis Services API on demand to process the model (refresh with latest data from our data store). The Logic App presents a URI that I can then call a POST against that triggers the processing.

Jorg Klein did an excellent post on this subject here, and it’s his method I use in the following example:

logicapps

Logic App Example

Once you’ve verified that the Logic App can call the Azure Analysis Services refresh API successfully, you simply need to embed it into the Data Factory workflow. This is simply a matter of using the Data Factory “Web” activity that is used to call the URI obtained from the Logic App you created above:

logicappPost

Logic App Post URL

Our final (simplified for this blog post) Data Factory looks like this, with the Web Activity highlighted.

datafactorywithweb

A simple test of the Data Factory pipeline verifies that all is working.

Conclusion

So, there you have it. My aim in this post was to see if we could create a simplified data warehouse-like approach that did away with a relational data warehouse platform yet still provided the ability to serve the various workloads of a modern data platform. By keeping the data all in one location (our data lake), we minimize the amount of data movement, thus simplifying many aspects, including governance and architecture complexity.

In terms of how we did it:

  1. Ingested data from source systems using Azure Data Factory, landing these as CSV files in Azure Data Lake Store
  2. Azure Databricks was then used to process the data and create our dimensional model, writing back the data files into Azure Data Lake Store
  3. Azure Analysis Services ingested the the dimensional files into its in-memory engine, presenting a user friendly view that can be consumed by BI tools
  4. Refresh of the Analysis Services model was achieved using Azure Logic Apps, with this component being added to our data pipeline in Azure Data Factory

Is This A Viable Approach?

Simply put, I believe the approach can work, however I think it is definitely dependent on specific scenarios. You can’t, or at least, not very easily, create “traditional” data warehouse elements such as Slowly Changing Dimensions in this approach. The example proposed in these articles is a simple star schema model, with a “rebuilt-every-load” approach being taken as our data sets are very small. For large, enterprise scale data warehouse solutions you need to work in different ways with Data Lake Store than we would do with a traditional data warehouse pipeline. There are many other factors to discuss that would affect your decision but these are out of scope for this particular article.

So, can we build a datawarehouse-less data warehouse?

Yes we can.

Should we build them this way?

It depends, and it’s definitely not for everyone. But the joy of cloud is you can try things out quickly and see if they work. If they don’t, tear it down and build it a different way. One definite benefit of this particular solution is that it allows you to get started quickly for an alpha or POC. Sure you might need a proper RDBMS data warehouse further down the line, but to keep things simple get the solution up and running using an approach such as suggested in this article and “back fill” in with a more robust pipeline once you’ve got your transformation code nailed down.

Happy building.

Further Reading

Azure Analysis Services With Azure Data Lake Store

Process Azure Analysis Services Using Logic Apps

Operationalize Databricks Notebooks Using Azure Data Factory

Introducing The Azure Data Architecture Guide

Designing data-centric solutions in the cloud can be a challenge. Not only do we have the standard challenge of meeting customer requirements, but we also have to deal with an ever evolving data landscape of tools, paradigms and platforms. In the good old days of data warehousing for example, we had relational stores, often coupled with flat files/rogue Access databases etc, which were then fed nicely into another database using an ETL tool and then made reporting ready, often following one of the prevalent design methodologies for these solutions (Kimball, Inmon, Data Vault etc).

Fast forward to now (2018), and whilst the data warehouse is far from dead, the modern data platform is a much evolved beast featuring fast data, slow data, big data, small data, cold data, funny-shaped data, not-funny – you get the picture. All of it is needed to be processed in a way that can derive actionable insights and intelligence for us poor humans.

To help with this, Microsoft have released the Azure Data Architecture Guide, and it provides, to quote directly –

…a structured approach for designing data-centric solutions on Microsoft Azure. It is based on proven practices derived from customer engagements.

The guide’s main focus is the difference between relational and non-relational data, the design paradigms of both and the best practices, challenges and processes for designing these kinds of solutions.

It is by no means a deep-dive in the nuances of each component, but it gives good, broad content in this subject, allowing us to understand and appreciate the core concepts before diving into specific focus areas as required.

Recommended Reading.

Find it here – https://docs.microsoft.com/en-us/azure/architecture/data-guide/

And a PDF can be downloaded here.