Connect Logic Apps to Data Lake Store using a service principal

Since a couple of months Azure Logic Apps offers an Azure Data Lake Connector that can perform the following actions on an Azure Data Lake Store:

  • Azure Data Lake – Append File (concurrent)
  • Azure Data Lake – Read File
  • Azure Data Lake – Upload File
  • Azure Data Lake – Append File (sequential)
  • Azure Data Lake – Create Folder
  • Azure Data Lake – Delete File
  • Azure Data Lake – List Files

Lately I have noticed several people on forums that are facing “Unauthorized” errors while using the Azure Data Lake connector. In this blogpost I will explain how you can authenticate using a service principal that has been authorized on the Azure Data Lake Store for a specific action. As an example I will show how to create a folder.

Azure Data Lake Store uses Azure Active Directory for authentication. When you use the Logic Apps Azure Data Lake connector, you see that there are two possible ways to authenticate:
000

You can either sign in with an Azure AD account, or you can connect using a service principal, the option I will describe. Both options result in a situation in which your Logic App will provide an OAuth 2.0 token that will be attached to each request that will be made to the Data Lake Store.

So what is a service principal? It is basically an impersonation identity of your application which is registered in Azure Active Directory. Using a service principal is preferred instead of running under your own credentials, because:

  • You can assign permissions to the app identity that are different than your own permissions. Typically, these permissions are restricted to only what the app needs to do.
  • You do not have to change the app credentials if your responsibilities change.
  • You can use a certificate to automate authentication when executing an unattended script.

You will receive a service principal object next to an application object when you register an Azure AD application in the Azure portal. You can create multiple service principals in case you have a multi-tenant application. This is not the case for the example I describe in this blog post but I advise you to read the following article to fully understand the concept of applications and service principals, and the relationship between them: Application and service principal objects in Azure Active Directory (Azure AD)

Now let me guide you through the following steps:

  1. Registering an application in Azure AD
  2. Granting the service principal access to an Azure Data Lake Store
  3. Use the service principal in a Logic App to connect with the Azure Data Lake Store and create a folder.

OverviewLet’s first start with what is required to configure the Azure Data Lake connector in the Logic App to use a service principal:
00

  • Connection Name: Display Name of the API Connection that will be created
  • Tenant: Directory ID property of your Azure Active Directory tenant
  • Client ID: Application ID of your Application Registration in Azure Active Directory
  • Client Secret: Key of your Application Registration in Azure Active Directory


Find out your Azure Active Directory ID (Tenant)

Navigate to the Azure portal, then go to the Azure Active Directory page and click Properties. Now copy the “Directory ID” and store it (Tenant property).
0

Register Application in Azure Active Directory

Navigate to the Azure Active Directory page again. On the Overview page, click “App registrations”
1

Click “New application registration”.
2

Enter a name for your App registration, choose select Web App / API for application type and enter a dummy Sign-on URL as we don’t intend to use Web Authorization.
3

You now probably see a long list of existing App registrations. Search for the one you just created and click it.
4

Copy and save the “Application ID” value, we need it for the Client ID property of the Logic App Data Lake connector.
5

Now click “All settings”, followed by “Keys”.
6A

Enter a key Description and choose a duration. A new key Value will be created. Copy and save it. We need it for the Client Secret property of the Logic App Data Lake connector.
6B

Grant the Application Registration (service principal) access to the Azure Data Lake Store

In this example I will grant the Application Registration write and execute permissions, so the Logic App will be able to create a folder in the Azure Data Lake Store.

In the Azure portal, navigate to your Azure Data Lake Store. Click “Overview” and then “Data Explorer”.
7

Click “Access”.
8A

Click “Add”.
8B

You can now search for Active Directory users or groups, but also for Application Registrations. Search on your App Registration name and click it.
9

To be able to create folders, Write and Execute permissions are required. In this example I grant access to “This folder and all children”, in my case “this folder” is the root (/ Folder). I add these permissions as “An access permission entry”, this means the permissions are applied to the currently existing files and folders. “A default permission entry” will also be applied to folders and files that are added to this folder in the future.
10

The permissions have been granted to the App Registration (service principal).
11

Create the Logic App and use the service principal to connect

Create a new Logic App from the Azure portal. From the online Logic App Designer search for the Azure Data Lake Actions. Choose “Create Folder”.
12

Click “Connect with Service Principal”.
13

Configure the connection using the values we saved in the previous steps.
15

Now enter your Azure Data Lake Store Account Name. This is just the name of your Azure Data Lake Store. Also enter a name for the folder in the Folder Path box.
16

You can now click the Run button to test your Logic App which should finish successfully.
17

When you now check the Data Explorer of your Azure Data Lake Store you will see the new folder has been created.
18

Azure Data Lake Store encryption using Azure Key Vault for key management

Case
You want to create an encrypted Azure Data Lake Store (ADLS)
with a master encryption key that is stored and managed in your own existing Azure Key Vault.

Using this setup, which is showed in the diagram below, all data in your Data Lake Store will be encrypted before it gets stored on disk. To decrypt the data, a master encryption key is required.
In this scenario a “customer managed” key will be used, this means the key is created and managed in your own Azure Key Vault. This as an alternative to a key that is managed and owned by the Data Lake Store service, which is the default. Managing keys in the Key Vault gives additional possibilities like revoking access to the key for the ADLS service identity or even permanently deleting the key from the Key Vault.

clip_image001 

Solution
In this blog post I’ll guide you through to the 3 steps below, all in an automated way using PowerShell scripting and an Azure Resource Manager (ARM) template to create your encrypted ADLS. I plan to blog later about the possibilities that Visual Studio Team Services offers to perform these deployment tasks.

1.       Create new “customer managed” key in existing Azure Key Vault

2.       Create a new ADLS with data encryption enabled

3.       Grant ADLS service principal access to Azure Key Vault and enable Key Vault managed encryption using your “customer managed” key

 

Prerequisites

·       Create Azure Resource Group I have created one named “adls-keyvault-demo” (akd)

·       Create Azure Key Vault if you do not already have one. I have created one named “akd-keyvault”

·      AzureRM 4.1.0. Module from the PowerShell Gallery. Required since we will use the new Enable-AzureRmDataLakeStoreKeyVault PowerShell function

 

PowerShell script

Executing the PowerShell script below creates the new key in your existing Azure Key Vault, it then creates a new ADLS using an ARM template (see below) and finally it will enable Key Vault managed encryption for your new ADLS. The comments in the script give further explanation and messages during execution will be written to the Windows PowerShell console to inform you on what’s happening. Make sure you have at least AzureRM 4.1.0 installed and the account you will use have sufficient permissions.

The following variables are used:

·       subscriptionId – Azure Subscription ID

·       rg – Azure Resource Group name

·       keyVaultUri – Key Vault DNS Name. Check your Key Vault Properties in Azure Portal.

·       keyName – Name of Key Vault key that will be used for the ADLS

·       armTemplateFileAdls – Path of your ADLS ARM template JSON file. You can find the definition below the PowerShell script, copy/paste it into a JSON file and store it on disk

·       adlsName – Name of your ADLS


# Variables; modify 
$subscriptionId = “00000000-0000-0000-0000-000000000000”

$rg = “adls-keyvault-demo”

$keyVaultUri = https://akd-keyvault.vault.azure.net/”

$keyName = “akd-adls-key”

$armTemplateFileAdls = “C:\CreateEncryptedADLS.JSON”

$adlsName = “akdadls”

 

#Authenticate to Azure and set the subscription context

Login-AzureRmAccount

Set-AzureRMContext -SubscriptionId $subscriptionId

 

Write-Host “Get Key Vault Name from URI $keyVaultUri

$keyVaultHost = ([System.Uri]$keyVaultUri).Host

$keyVaultName = $keyVaultHost.Substring(0, $keyVaultHost.IndexOf(‘.’))

 

Write-Host “Creating software-protected key $keyName in Key Vault $keyVaultName

$adlsKey = Add-AzureKeyVaultKey -Destination Software -Name $keyName -VaultName $keyVaultName

 

#Get current Version identifier of key which will be used for the creation the ADLS using the encryptionKeyVersion parameter

$adlsKeyId = $adlsKey.Version.ToString()

 

Write-Host “Create new encrypted ADLS by deploying ARM script $armTemplateFileAdls in resource group $rg

New-AzureRmResourceGroupDeployment -ResourceGroupName $rg -TemplateFile $armTemplateFileAdls `

-DataLakeStoreName $adlsName -KeyVaultName $keyVaultName -DataLakeStoreKeyVaultKeyName $keyName -DataLakeStoreKeyVaultKeyVersion $adlsKeyId

 

#Get the ADLS account and it’s Service Principal Id

$adlsAccount = Get-AzureRmDataLakeStoreAccount -Name $adlsName

$adlsAccountSPId = $adlsAccount.Identity.PrincipalId

 

Write-Host “Grant ADLS account Service Principal $adlsAccountSPName required permissions on the Key Vault”

#Grant ADLS account access to perform encrypt, decrypt and get operations with the key vault

Set-AzureRmKeyVaultAccessPolicy -VaultName $keyVaultName -ObjectId $adlsAccountSPId -PermissionsToKeys encrypt,decrypt,get -BypassObjectIdValidation

 

Write-Host “Enable ADLS Key Vault managed encryption”

Enable-AdlStoreKeyVault -Account $adlsAccount.Name

 

Write-Host “ADLS $adlsName is now encrypted using key $keyName in Key Vault $keyVaultName


 

ARM Template ADLS


{

  “$schema”: https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#”,

  “contentVersion”: “1.0.0.0”,

  “parameters”: {

    “DataLakeStoreName”: {

      “type”: “string”,

      “minLength”: 1

    },

    “KeyVaultName”: {

      “type”: “string”,

      “minLength”: 1

    },

    “DataLakeStoreKeyVaultKeyName”: {

      “type”: “string”,

      “minLength”: 1

    },

    “DataLakeStoreKeyVaultKeyVersion”: {

      “type”: “string”,

      “minLength”: 1

    }

 

  },

  “resources”: [

    {

      “type”: “Microsoft.DataLakeStore/accounts”,

      “name”: “[parameters(‘DataLakeStoreName’)]”,

      “apiVersion”: “2016-11-01”,

      “location”: “North Europe”,

      “tags”: {

        “displayName”: “Datalake Store”

      },

      “identity”: {

        “type”: “SystemAssigned”

      },

      “properties”: {

        “encryptionState”: “Enabled”,

        “encryptionConfig”: {

          “type”: “UserManaged”,

          “keyVaultMetaInfo”: {

            “keyVaultResourceId”: “[resourceId(‘Microsoft.KeyVault/vaults’, parameters(‘KeyVaultName’))]”,

            “encryptionKeyName”: “[parameters(‘DataLakeStoreKeyVaultKeyName’)]”,

            “encryptionKeyVersion”: “[parameters(‘DataLakeStoreKeyVaultKeyVersion’)]”

          }

        }

      }

    }

  ]

}


 

After you successfully execute the PowerShell script, navigate to the Azure portal to check if everything is OK.

Data Lake Store à Settings à Encryption

image

The account is successfully encrypted using the Key Vault key. The ADLS account has a generated Service Principal named “RN_akdadls” which we granted permissions to the Key Vault in the PowerShell script.

 

Key Vault à Settings à Keys

clip_image005

The key has been created and is enabled.

 

Key Vault à Settings à Access policies

clip_image007

The ADLS Service Principal has an access policy that we set with the PowerShell script.

Opening it shows the key permissions:

clip_image009


Special thanks to my Macaw colleague Simon Zeinstra for working together on this solution!