SSIS – Lookup is case sensitive

Posted by

A while ago I figured out that the lookup transformation is case sensitive.
I used a lookup to find dimension table members in for my fact table records. This was done on a String business key like ‘AA12BB’. I attached a table for the error output and after running the package I found one record in this table.This record had a business key like ‘Aa12BB’. I searched the dimension table for this missing record and it surprised me, it DID exist but with the following business key: ‘AA12BB’. It seemed the lookup transformation is case sensitive. Next thing I tried was a T-SQL query in the management studio of SQL Server 2005. In the WHERE clause I referred to the business key: ‘Aa12BB’. The query returned the record with business key ‘AA12BB’. Conclusion: SQL Server is not case sensitive but the SSIS lookup component IS case sensitive… Interesting.

Solution:
After some research I found a few solutions for this interesting feature of the lookup transformation. Before I explain these solutions you must know something about the inner working of the lookup component.

A lookup transformation uses full caching by default. This means that the first thing it does on execution, is loading all the lookup data in its cache. When this is done it works as expected, but with case sensitivity.

The solution is to set the CacheType property of the lookup transformation to Partial or None, the lookup comparisons will now be done by SQL Server and not by the SSIS lookup component.
Another solution is to format the data before you do the lookup. You can do this using the T-SQL LOWER() or UPPER() functions. These functions can be used in a query or for example in a derived column SSIS component.

3 comments

  1. To avoid lookup failures that are caused by case differences in data, first use the Character Map transformation to convert the data to uppercase or lowercase. Then, include the UPPER or LOWER functions in the SQL statement that generates the reference table

    Like

  2. I like Amit’s solution.  My approach is: store the data in the dimension just as it comes in (whatever case).  But transform it (and the input data) to UPPER for the purposes of comparison (lookup).
    If you’re unlucky enough to have SQL running with a case-sensitive collation order, then all bets are off…
    The exception to this approach is trailing/leading spaces.  I don’t want that nonsense in my dimension, so I LTRIM(RTRIM( the input before saving it to the dimension.  This does mean the input fact data has to be LTRIM(RTRIM(-ed for the lookup.

    Like

Leave a comment