Modern data analytics platforms often follow a hub and spoke architecture in which a data hub is used as central data ingestion area that contains all data, usually in raw format. Data spokes deliver data to the majority of the end users in a data store that meets their specific requirements. The data hub can be build on Azure Data Lake Store (ADLS) and/or Azure SQL Data Warehouse. Data spokes can basically come in any form, dependent on the specific requirements. They are often build on Azure Blob Storage, Azure SQL Database (ASQLDB) and/or Azure Analysis Services (Azure AS). When ADLS is used as data hub to store data in raw unstructured format, a SQL DW/DB layer is typically used to deliver structured data to Azure AS models.
These architectures come with many benefits – that are out of scope of this blog post – but there are also some challenges. Sometimes the speed of business can’t wait for the development of the perfect data spoke model and the data flows to feed it. Sometimes it is not practical to ship all data around to spoke data stores due to its size or timing. Or maybe you just want to execute a proof of concept and take some shortcuts.
Fortunately Microsoft is investing a lot in Azure AS, making it more suitable for these modern architectures. The recent Power Query integration makes it possible to implement lightweight data transformations while loading data into Azure AS models. This makes Azure AS more suitable to load data from raw data stores like ADLS, without the need for an ETL process and structured (SQL) data store in between. Also, almost any data source, including ADLS, is now supported which makes it possible to load data from virtually anywhere.
In this blog post I will show how you can create multi-source partitioned Azure AS tables. This allows you, for example, to combine cold historical data from an ADLS data hub with hot actual data from an ASQLDB, which could be a source system database. In this scenario there is no SQL layer between ADLS and Azure AS. As stated above, the Power Query functionality in Azure AS is perfectly capable of importing data from CSV files that are stored in ADLS.
The example in this blog post is based on the WideWorldImportersDW database. I have created a view in this database that returns the most recent order data. I have exported the historical order data to a CSV file that I uploaded to ADLS. The SQL view and ADLS CSV file will be used as data sources for a single partitioned Azure AS order table.
The Azure AS ADLS-connector in Power Query does not automatically establish an association between the folders or files in the store and the tables in the Tabular model. In other words, you must create each table individually and select the corresponding CSV file in Query Editor to be able to import into Azure AS tables. The ADLS-connector is very well explained in this MSDN blog post.
To create an ADLS/ASQLDB sourced partitioned Azure AS table, start with importing data from a file that resides in your ADLS by following the step-by-step guide that you can find at the MSDN blog post. To find that section, search the page on “To implement this design in a Tabular model, use the following steps“. After executing the 15 steps, you have created a tabular model with one table that contains data imported from ADLS.
Next step is to create a new Data Source for the ASQLDB.
Right click the SQL Data Source and choose Import New Tables.
Select the table or view that contains the hot data and then click Edit to open the Query Editor.
In the Query Editor, click on your table in the Queries pane on the left and then open the View menu and click Advanced Editor.
Copy the query from the Advanced Editor. Next click Cancel and close the Query Editor.
We will use the query to add a partition to our existing ADLS sourced table. Open the table partition settings by right clicking your table and choosing Partitions…
For this example we will keep things simple and create only two partitions, one for the cold data coming from ADLS and one for the hot data that comes from ASQLDB. Frist, rename the current partition that holds the cold data. Click New to create a new partition for the hot data and give it a meaningfull name. Now paste the query and click Validate to check if the query statement is valid. Click OK.
Now process your table and that’s it. You have created a multi-source partitioned Azure AS table!