Introducing The Azure Data Architecture Guide

Designing data-centric solutions in the cloud can be a challenge. Not only do we have the standard challenge of meeting customer requirements, but we also have to deal with an ever evolving data landscape of tools, paradigms and platforms. In the good old days of data warehousing for example, we had relational stores, often coupled with flat files/rogue Access databases etc, which were then fed nicely into another database using an ETL tool and then made reporting ready, often following one of the prevalent design methodologies for these solutions (Kimball, Inmon, Data Vault etc).

Fast forward to now (2018), and whilst the data warehouse is far from dead, the modern data platform is a much evolved beast featuring fast data, slow data, big data, small data, cold data, funny-shaped data, not-funny – you get the picture. All of it is needed to be processed in a way that can derive actionable insights and intelligence for us poor humans.

To help with this, Microsoft have released the Azure Data Architecture Guide, and it provides, to quote directly –

…a structured approach for designing data-centric solutions on Microsoft Azure. It is based on proven practices derived from customer engagements.

The guide’s main focus is the difference between relational and non-relational data, the design paradigms of both and the best practices, challenges and processes for designing these kinds of solutions.

It is by no means a deep-dive in the nuances of each component, but it gives good, broad content in this subject, allowing us to understand and appreciate the core concepts before diving into specific focus areas as required.

Recommended Reading.

Find it here – https://docs.microsoft.com/en-us/azure/architecture/data-guide/

And a PDF can be downloaded here.

 

Exam Tips For Getting Your MCSA (BI Reporting)

To get your MCSA (BI Reporting) you need to take 2 exams – Analyzing and Visualizing Data With Power BI (70-778) and Analyzing and Visualizing Data with Excel (70-779). I recently took both exams at Microsoft Ready, and thought with them fresh in mind it was a good opportunity to “pass it on” and provide some tips and advice on how to achieve the certification. I wasn’t actually planning to take 70-779, but in a prep session we were informed that 70-779 is very similar to 70-778 so decided to take both.

It was good advice, and I managed to pass both in the same day, achieving my MCSA (BI Reporting) in the process.

mcsa-bi

MCSA (BI Reporting)

With this mind, I’ve added a couple of study recommendations plus some stand out items from both exams that I’d recommend you double down on when preparing.

Study Recommendation 1Take the Analyzing and Visualizing Data with Power BI on Edx. It’s free to take (or audit, as it’s referred to on the site) the course, and it provides an excellent foundation in all the skills tested in 70-778.

Study Recommendation 2 – Take the Analyzing and Visualizing Data with Excel on Edx. This is basically the sister courses of the Power BI, and it provides an excellent foundation in all the skills test in 70-779..

Study Recommendation 3 – Go through the Power BI Guided Learning. Like the above course, this provides an excellent Zero to Hero progression in Power BI, developing the hands on skills that are essential to knowledge retention.

After completing the above, below are some of my specific recommendations based on what I encountered in the exams:

  1. Know your DAX. Especially creating measures, calculated columns and calculated tables. Know the difference between CALCULATE, SUMX and similar operators. Here is a good place to start.
  2. Know your M, specifically around manipulating string values in columns. Check out the M language reference and play around with a few commands to get the feel for it.
  3. Know the difference between Import to Data Model and Import to Worksheet (70-779).
  4. Relationships in the Data Model – Know when you would use Bidirectional Filters, the difference between 1-1 and 1-many joins and specifically the impact of how the filter direction affects the visuals produced. There’s a great whitepaper on this subject here.
  5. Refreshing Power BI models in the Power BI Service – Understand what you can refresh from Onedrive, the difference between DirectQuery and Import modes.
  6. Pivot Tables and Pivot Charts in Excel. Understand the structure (Rows vs Columns vs Values). (70-779)
  7. Slicers and Filters. Swot up on Report Interactions and understand how to to turn this off and on to achieve the desired effect. Go through this tutorial.
  8. Know the main differences between Power BI Pro and Power BI Free and Power BI Premium. Know what each licencing option gives you, and it doesn’t. Start here.
  9. Sharing in Power BI – know how to do it, know the differences between the different sharing options. Read this for a good overview.
  10. Row Level Security – Understand the main concepts of Roles and Rules and how to implement it via the data model.
  11. Data Model sizing and performance best practices. MVP Avi Singh provides a great video on this here.
  12. Understand the kind of data sources supported in Power BI. I’d suggest playing through connecting to and manipulating the following sources to get a feel for the steps you undertake – SQL table, JSON file and Excel workbook.

These are the items I encountered when I took 70-778 and 70-779. There will obviously be some variation in what you will encounter, but hopefully this provides some good pointers in where focus. Hands down though the best preparation is going through both of the MOOCS identified at the beginning. They cover massive chunks of the material tested in the exam so I 100% recommend you work through these as a minimum.

Good luck!

*Addendum* Shout out to my colleague Dustin Ryan who’s produced a similar post for 70-778. I saw this posted on LinkedIn recently when was this one was nearly done so figured I’d still release it. Read both. They can only help you in the long run.

https://sqldusty.com/2017/08/23/preparing-for-microsoft-certification-exam-70-778-analyzing-and-visualizing-data-with-microsoft-power-bi/

 

SQL Server Options In Azure

The data platform options in Azure are vast, and they grow and change with each month. Choosing the right database platform for your workloads can be a challenge in itself. This post aims to give some clarity as to the options available in the SQL Server space.

Before diving in, it’s worth clarifying the underlying platform options to be discussed. SQL Server in Azure comes in both Infrastructure As a Service (IaaS) and Platform As a Service (PaaS) flavors, which each providing different benefits depending on your specific needs. This is best summarized in the graphic below:

sqliaasoptions

SQL Server IaaS & PaaS

Infrastructure As A Service Options

SQL Server In Azure VM

As simple as it sounds, you can run SQL Server “as is”, running the full engine on a Windows or  Linux virtual machine, and also in a Docker container. This option is exactly the same as the one you’d run on premise, with the full SQL Server feature set, including High Availability, Security and various other features depending on the edition you choose.

If you need full control over/access to the OS, need to run apps or agents alongside the database, and basically want to manage all aspects of your solution, then SQL Server IaaS is the right solution.

This SQL Server option can be accessed from the Azure marketplace, with many editions available depending on your needs.

Billing in this model comes in two flavors too, Bring Your Own Licence (BYOL) where you provide your own SQL licence and just pay for the underlying compute/storage, or Pay As You Go where you pay the SQL licence per minute the machine is running for.

Platform as a Service (PaaS) Options

Along with the full control of SQL Server IaaS, you also have several PaaS options. These provide much better total cost of ownership (TCO) as much of the daily administration is handled by the Azure platform, so things like backups, high availability, performance tuning and monitoring are all abstracted away, allowing you to just focus on building/using your application.

 

sqlpassoptions

SQL PaaS Options

SQL Database

The first of the pure PaaS offerings, (sometimes called Database As a Service (DBaaS)), SQL Database offers the power of the SQL Server engine, but without the management overhead that comes with maintaining a full SQL Server instance.

As a DBaaS, SQL Database brings with it many features, including dynamic provisioning and resizing, built in High Availability, automatic backups, point-in-time restore and active geo-replication. As Microsoft assumes much of the daily maintenance work, this frees you up to realize cost or operational benefits that you wouldn’t have experienced with your on-premise or hosted solution.

Unlike the IaaS option, where you choose the virtual machine that your instance will reside on (with associated cpus, ram etc), SQL Database instead comes in different tiers, from Basic through to Premium. Rather than choosing specific hardware items like RAM, SQL Database tiers are measured in Database Throughput Units (DTUs), with the underlying specifications abstracted away.

Read here for more details on how to choose the best tier for your application.

There is a small subset of features not available or not applicable in SQL Database compared to SQL Server. Please check here and here.

SQL Database Managed Instances (Preview)

Designed to minimize the challenges of migrating applications to a SQL Database environment without having to perform application rewrites, SQL Managed Instance is an extension to SQL Database that offers the full SQL Server programming surface and includes several native SQL Server features:

  • Native Backup and Restore
  • Cross Database queries and transactions
  • Security features – Transparent Data Encryption, SQL Audit, Always Encrypted and Dynamic Data Masking
  • SQL Agent, DBMail and Alerts
  • Change Data Capture, Service Broker, Transactional Replication and CLR
  • DMVs, XEvents and Query Store

On top of this, SQL Managed Instance offers full security and isolation, with SQL Managed Instance being behind their own virtual network (vnet) within Azure (this is now also available for SQL Database).

Currently in private preview, with public preview due shortly, SQL Managed Instance is a great way to get started moving data to the cloud that combines the benefits of both PaaS and IaaS SQL Server models without having to make changes to the affected application. Microsoft have also created the Azure Data Migration Service to make the migration as seamless as possible.

SQL Database Managed Instance also provides an added incentive to move to the cloud with the Azure Hybrid Cloud Benefit for SQL Server. This allows you to move your on-premise SQL Servers to Azure, and only pay for the compute and storage. See here for more details.

SQL Database Managed Instance is going to be a real game changer in this space in my opinion. My colleague James Serra has created an excellent deck that goes into more details here.

SQL Database Elastic Pools

Whilst not technically a different type of SQL Database offering, Elastic Pools provides the ability to manage multiple SQL Databases that have a variable and unpredictable workload. SQL Databases in an Elastic Pool are allocated elastic Database Throughput Units (eDTUs) that dynamically scale the databases within the pool to meet the required performance demands.

Elastic Pools are ideal for multi-tenancy environments where the workload can’t be predicted, but you don’t want to have to over provision for those “just in case” moments.

SQL Data Warehouse

SQL Data Warehouse, whilst is “relational” in that it has tables with relations between them, is a different concept to the options above. SQL Data Warehouse is what is known as an MPP (Massively Parallel Processing) solution, designed for heavy duty data processing and querying at scale, the likes you’d see in a data warehouse or similar analytics solution.

SQL Data Warehouse could easily be the subject of a full article all on its own. This platform provides a fully scalable analytics engine where the compute and storage are scaled independently, meaning you’re not fixed into certain configurations that are over specced, thus providing greater TCO benefits. With the underlying Polybase engine supporting it, SQL Data Warehouse is a must-have component in many modern data warehouse architectures within Azure.

Read here for more information about SQL Data Warehouse use cases and design patterns.

Honorable Mentions

This post is specifically around SQL Server options within Azure, however I wouldn’t be doing justice to the platform if I didn’t mention the other database options available, and will be covered in future posts:

  • Cosmos DB – A globally distributed, multi-mode database service that supports SQL, Graph, Cassandra, MongoDB and Table APIs.
  • Azure Database for MySQL – Exactly as it says on the tin. This is another PaaS offering that provisions a MySQL instance.
  • Azure Database for PostgreSQL – As above, this PaaS offering provides PostgreSQL functionality, all managed by the Azure platform.

Which Should I Choose?

There are many factors to consider when choosing a relational database service in Azure, including cost, control, workload type, governance requirements and many more. As a general rule, if you need to have full control over your environment, with a full feature set, then SQL Server 2017 as a virtual machine or container is the way to go. For applications born in the cloud that just need the database engine and don’t want or need the hassle of maintaining the database, then SQL Database or SQL Database Managed Instance are excellent options. Again, if you need that elasticity that comes with hosting a multi tenant or highly variable environment, then SQL Database Elastic Pools is the option for you.

The real game changer now is SQL Managed Instances. Offering a full SQL Server feature set, but with all the benefits of PaaS, this option is great for those looking for a seamless move from on-premise to the cloud.

Further Reading

SQL Database vs SQL Server

SQL Data Warehouse vs SQL Database

SQL Managed Instances