Last week Microsoft released CTP3 of SQL Server Integration Services (SSIS), code name: Denali. In this blog post I will look into the new key features and some of the minor improvements in SSIS Denali.
Shared Connection Managers
Connection Managers can now be shared on SSIS project level.
You can create them in the solution explorer, in the folder Connection Managers:
Once created, they will appear automatically in all your SSIS packages. The names are in bold so you can recognize them between your package-level connection managers:
You can also create Shared Cache Connection Managers for your cached lookups. This will be very useful for lookups that are performed multiple times, for example when you look up dimension tables multiple times from your fact table ETL. You can just convert the local cache connection manager by right clicking it:
What happened to the Shared Data Sources we knew from prior SSIS versions? Well they are no more. Shared Data Sources only lived at design time and not at runtime so they were not really useful anyway. Shared Connection Managers do live at runtime and offer even more possibilities as I will show later on, so they replace the old Shared Data Sources.
Data Flow – Column mappings
SSIS always mapped columns from source to transformations or destinations with the help of lineage ids. Every column had a unique metadata ID that was known by all components in the data flow. If something changed in the source this would break the lineage ids and raised error messages like: The external metadata column collection is out of synchronization with the data source columns.
To fix this error you would re-map all broken lineage ids with the “Restore Invalid Column References Editor”.
In Denali lineage-ids are no longer used. Mappings are done on column names, which is great because you can now use auto map on column names and even copy/paste pieces of another data flow and connect them by mapping the corresponding column names.
Data Flow – Flexible order of authoring
This improvement helps you edit data flow components even when they don’t have an input attached. Theoretically you can build your data flow backwards; start with the destination and track back to the source.
Data Flow – Groupings
You can now group data flow components. Select the components you wish to group, right click and select Group:
The result is some sort of a data flow sequence container:
By clicking the arrow it will collapse:
Data flow groups are 100% eye candy; you can’t set any properties on them.
Data Flow – Data Quality Services Cleansing transformation
With this transformation you can apply data quality rules in the data flow. This is done by using a Knowledge Base which can be created by yourself or downloaded from the Windows Azure Marketplace. For example you could apply a rule that checks if a given postal code column is valid for a particular town column in your record.
Data Flow – Data Tap
In Denali, we have the possibility to attach a “tap” at a data flow path (arrow). This tap captures all data coming through and dumps it in CSV files.
SSIS Denali is fundamentally different to its predecessors when it comes to the concept and usage of configurations. SSIS package configurations are obsolete (they will still be available if you really want to use them ;-)) and parameters and environments are in the new kids in town.
SSIS Parameters look a lot like SSIS variables but there are some differences. There are two types of these parameters:
1. Package Parameters:
Look at Package parameters as C# parameters, which are passed as input to a C# function(=your package). You can set them when executing (call) a package and the lifecycle of the parameters are limited to the SSIS package itself.
The difference with SSIS variables? You can set the parameters while executing a package with SSMS or an Execute Package Task.
Define your Package Parameters at the Package Parameters tab:
2. Project Parameters:
Identical to Package Parameters, except for the scope, these Parameters exist on project level and can be referenced throughout all SSIS packages in your project. Package Parameters only live inside a single SSIS package.
You can define Project Parameters in the solution explorer within your SSIS project:
Both Package and Project Parameters can be referenced from your SSIS packages, you recognize them by their prefix, $Package or $Project:
Setting parameters in a Execute Package Task is achieved by the new Parameter bindings tab:
It’s also possible to parameterize SSIS tasks on the Control Flow by right clicking them and choose Parameterize:
Loads of capabilities here! I now realize that I have created a workaround for Package Parameters with my SSIS Package design pattern for loading a data warehouse where I (mis)used Package Configurations as Package Parameters. Creating a new package design pattern for Denali definitely goes on my TODO list!
Environments are a collection of SSIS package settings that can be define on the SSIS Server. At runtime, the environment will override these settings in the SSIS packages. You can create multiple environments and when you run your SSIS packages you can select which environment it should use. It’s also possible to let multiple SSIS projects run under one environment, so flexibility all around the clock.
To make you understand the principle of Environments right away I have created a simple example that you will commonly use: Create two Environments, one with development settings and one with production settings.
I have deployed a Test SSIS project to SSMS which contains one SSIS package with one Shared Connection Manager. Notice the new Integration Services folder structure:
Next right click Environments and choose Create Environment:
Type Development as name for the Environment and click OK:
Now double click the Development Environment:
Click on the Variables tab and create an Environment Variable which will overwrite the Shared Connection Manager. Type in the connection string for the development server as the Value and click OK:
Next create another Environment with the name Production and also create an Environment Variable with the name SCM_EnvironmentVar. The only difference between these two variables should be the value of the variable; it contains the production server connection string instead of the development value.
You now have two Environments with one Environment Variable for the Shared Connection Manager each:
We now need to reference the Development and Production Environments in the Test project so they can be used. Right click the Test project and choose Configure:
Go to the references page and click Add..
Both environment are now referenced from the Test SSIS project and are available. Let’s configure the Shared Connection Manager so it will use the value specified in one of the environments. Click on the Parameters page, the Connection Managers tab and the … next to the ConnectionString property of the Shared Connection Manager:
Now select Use Environment Variable and select SCM_EnvironmentVar. Click OK:
The name of our Environment Variable is shown in the Value box of the ConnectionString property:
We have now succesfully attached the Environment Variable to override the ConnectionString property with a value from either the Development or the Production Environment. Executing the package under one of both Environments is very easy. To do so, right click the package and choose Run:
The Run Package dialog box appears and you instantly see a message that reminds you to assign a value to the ConnectionString property of connection manager SharedConnectionManager:
At the bottom of the dialog box you can select the Environment your package should use. Select the .\Development Environment and click OK:
In the overview report you see the package has succesfully ran under the Development Environment:
If you change the package to use the Production Environment and you run the package again, you’ll get the following execution information:
Pretty straightforward and easy to use. The big difference? All the configuration work has been done in SQL Server Management Studio! Exactly the place where it should be done. In the current SSIS versions most of this work was done within Visual Studio and the configuration was done by editing tables. So unfriendly! Big improvement, great work MS!
In this example I have set a property of a Shared Connection Manager but of course it is possible to set Package or Project Parameters with Environment Variables.
We have had the option to deploy SSIS packages from Visual Studio for a long time thanks to BIDS Helper. Microsoft finally added this functionality out of the box in Denali. They even did more: you can now also build your projects from within Visual Studio!
There are two ways to deploy (push) your projects to the new SSIS server, directly from Visual Studio or by using an Integration Services Project Deployment File. They both use the new Integration Services Deployment Wizard.
Deployments directly from Visual Studio can be done by right clicking your project and then choose Deploy. Now, the Deployment Wizard will pop straight up:
The other way is building your project first. This was already possible in SSIS version(s) prior to Denali but did not really have useful meanings.
In Denali this is different. Building your project will generate an .ispac file which contains your entire project and this can be used by the Deployment Wizard. Again a very nice new feature.
Handing over a new release of a SSIS solution to a customer’s IT department for deployment can now be done in a very nice, professional, manner. Ispac files can also be opened by Visual Studio to import your SSIS project.
Building your project can be done by right clicking your project and choosing Build:
The output window displays the following:
When looking at the bin\Development folder we see the new Test.aspac file:
Double clicking (or choosing Deploy in Visual Studio) this file will start the new Integration Services Deployment Wizard:
You can now select the project that you wish to deploy. You can either select to use the .ispac Project deployment file or choose to deploy an existing project located in any Integration Services catalog. The last option is useful when you wish to deploy from, for example, a TEST SSIS solution that is already located in a catalog to an ACCEPTATION environment in another catalog.
The project is loaded and validated:
Next, you now need to select the destination, which has to be a SSIS Server:
Review your selections and click on Deploy to start the actual deployment:
The results show a successful deployment. Notice the protection level has been changed. What happened here?
The wizard clears the protection level, as we know from Visual Studio, to prevent “cannot decrypt password” errors.
Instead of using protection levels, passwords will be stored in clear text. The entire package (with the passwords in it) will be stored encrypted in the SSIS Server tables you are deploying to.
The project has now been deployed to the server:
When you right click the Test project and choose for Versions you are able to see the current version of your project:
If you deploy a newer version later on, the existing version(s) stay on the server and you can easily role back to a previous version if you’ve made a bad deployment.
The diagram below shows the entire deployment life cycle (source: TechNet):
The new SSIS Server is the central storage and administration point of your SSIS solutions. No longer is this a standalone server that you’ll need to manage. Basically it is a database with Stored Procedures.
You now have an Integration Services node available in SSMS when you connect to the database engine:
Under the Integration Services node you will find your SSISDB catalog which holds all your SSIS solutions with its packages, environments, etc. The physical storage of these objects will be conducted in a SQL Server database with the same name as the catalog:
This database also contains all the stored procedures containing all the programming code for the SSIS Server:
With the help of these stored procedures you can manage your SSIS Server: e.g. your parameter values, connection managers, and override properties by using environments.
Next to the configuration functionalities I have discussed earlier you can also implement security to control access to the catalog, both on folder level and package level. When you right click a folder/package in the catalog and choose the properties you’ll get the following window where you can manage security access:
Finally you’ll get an out of the box reporting dashboard which is built on the out of the box logging functionality that SSIS offers. On this dashboard you’ll get information about the execution history of your package and its sub packages, view which parameters were used, view specific messages and get a performance report over time.
All the information is logged automatically if any package runs on the server. The information is very detailed; you can even get the row counts between SSIS data flow transformations!
When running a package you are able to select a logging level on the Advanced tab:
You can choose for:
– None: turn logging of for performance reasons
– Basic: error and warning logging
– Performance: detailed trace information
– Verbose: diagnostics and fault debugging
When I run my Test package I’ll get the following execution dashboard. There are some hyperlinks that navigate to more detailed reports.
So no need for creating your own logging framework anymore, it’s all out of the box!
Really great option, and explanation. thanks
Thanks for your explanation
Good and compact overview of relevant and usefull changes/additions to BI-Denali!
Hey its a nice one.
I have posted an article on the topic “Installation Guide for Denali CTP 3”,
So you can browse it from here:
Thanks for your explanation