Biml, Microsoft Technologies, SSIS

Biml for a Task Factory Dynamics CRM Source Component

I recently worked on a project where a client wanted to use Biml to create SSIS packages to stage data from Dynamics 365 CRM. My first attempt using a script component had an error, which I think is related to a bug in the Biml engine with how it currently generates script components, so I had to find a different way to accomplish my goal. (If you have run into this issue with Biml, please comment so I know it’s not just me! I have yet to get Varigence to confirm it.) This client owned the Pragmatic Works Task Factory, so we used the Dynamics CRM source to retrieve data.

I was ultimately creating a bunch of packages that looked like the below.

There are two pieces to using the Task Factory Dynamics CRM Source: the source component and the connection manager. The code for both is below.

Things to note for the source component:

  •  This is pulling data for an entity in Dynamics 365. If you were to populate this by hand, the Task Factory UI would ask you which entity. There are several entities that are visible, but do not allow you to retrieve the data. You can find a list of internal entities here.  You won’t want to stage data from those tables. I generated all of my packages at once with Biml, using a Blacklist to exclude the internal entities.
  • The entity name that should be the value on line 21 is the entity logical name.
  • I’m iterating through columns in tables in a separate file and holding them in memory, hence the references to Table.columns. You’ll need to write some code to get the entity metadata to feed that, or find another way to provide the table names and column names and data types.
  • You must set the ErrorRowDisposition and TruncateRowDisposition to “NotUsed” as I did on line 10 for each column or your package will not work.
  • In OutputColumns and ExternalColumns collections, I just have if statements to check data types. There may be a more elegant way to do this, but this worked for me and I went with it. Feel free to leave suggestions in the comments if you have a better idea.
  • The Connection element on line 145 should keep the name “DYNAMICSCONNECTION”. If you change it, the connection won’t work. In my actual project, my connection manager is in a separate file and is named “TF_DynamicsCRM”. You can set the ConnectionName property to whatever you want, as long as it matches your connection definition (Line 153 in the gist).

Things to note for the Connection Manager:

  • I tried reverse engineering the connection manager in Biml Studio, and that got me close, but it didn’t get the ObjectData property quite right. I ended up having to create one manually, view the code for it in SSDT, and copy it into my Biml file. I have my packages and project set to EncryptSensitiveWithPassword, which is why you see the p4:Salt value on line 158. Your connection manager will have different values for lines 158, 159, and 160. If you set your project to EncryptSensitiveWithPassword, this value will stay consistent across developers. But if you use EncryptSensitiveWithUserKey, the value may change, which will be fun for regenerating with Biml. So make sure you plan for that if you have multiple developers or multiple computers on which you are working.
  • This connection manager is set to connect to Dynamics 365, hence the ServerHost=disco.crm.dynamics.com. If you have an on-prem or hosted environment that isn’t 365, you’ll have to find the correct ServerHost value and put it in line 165.

I hope that helps someone looking to generate the Task Factory Dynamics Source with Biml.

Azure Data Factory, Biml, Conferences, SSIS

I’m Speaking at IT/Dev Connections 2017

I’m pleased to say that I am speaking at IT/Dev Connections 2017. This year the conference will be held in San Francisco October 23-26. I had a great experience speaking at IT/Dev Connections in 2015, so I am excited to return again this year.

This conference is special to me because of its focus on providing great content for developers and IT pros – the conference website describes it as the “anti-keynote” conference with no forced marketing content.

I also enjoy it because it is more than just SQL Server/Data Platform (they have tracks for Cloud & Data Center, Enterprise Collaboration, Development & Dev Ops, and Enterprise Mobility and Security), and it’s nice to get out of my comfort zone a bit.

I will deliver two sessions at the conference.

Azure Data Factory in A Nutshell

If you have been wanting to get into Azure Data Factory (ADF) development, join me for this demo-filled overview. In this session, we’ll go over the basic anatomy of an ADF solution. You’ll learn what ADF is and isn’t as we walk through a solution to pull data from an on-premises SQL Server database to a blob storage and then populate and Azure SQL Data Warehouse. You’ll learn tips for creating ADF solutions in Visual Studio, and I’ll show you how to make ADF development less tedious with a free Visual Studio Add-in called Biml Express. You’ll leave with a basic understanding of ADF and a list of tools and skills you’ll want to acquire as you begin your ADF development.

Improve Data Warehouse ETL Delivery with a Patterns-Based Approach

What if I told you that 90% of your data integration development in SQL Server could be automated? In 5 years, you will be “old fashioned” if you are hand coding SSIS packages. Developers with different skill levels and design preferences create databases and SSIS packages however they see fit to get the job done. Documentation is frequently omitted. Maintenance and small enhancements consume too much development time while manual errors and inconsistencies slip through the testing and release process. You can use tools and frameworks to rearrange the development process and alleviate these common problems. The implementation and automation of design patterns leads to improved efficiency and communication. Join me in this session to learn how to use Business Intelligence Markup Language (Biml) and Excel to facilitate metadata-driven SSIS development. I’ll use database schema information plus Excel inputs to implement a small data mart from staging through the dimensional model.

I hope you will join me in San Francisco in October!

Azure, Azure Data Factory, Azure Data Lake, Biml, Microsoft Technologies

Copying data from On Prem SQL to ADLS with ADF and Biml – Part 2

Note: This post is about Azure Data Factory V1
I showed in my previous post how we generated the datasets for our Azure Data Factory pipelines. In this post, I’ll show the BimlScript for our pipelines. Pipelines define the activities, identify the input and output datasets for those activities, and set an execution schedule. We were creating several pipelines with copy activities to copy data to Azure Data Lake Store.

We generated one pipeline per schedule and load type:

  • Hourly – Full
  • Hourly – Incremental
  • Daily – Full
  • Daily – Incremental

We also generated some one-time load pipelines for DR/new environment setup.

The first code file below is the template for the pipeline. You can see code nuggets for the data we receive from the generator file and for conditional logic we implemented. The result is one copy activity per source table within the appropriate pipeline.

In the second code file below, lines 104 to 119 are generating the pipelines. We read in the necessary data from the Excel file:

  • Schema name
  • Table name
  • Columns list
  • Incremental predicate

Sidenote: We wrote a quick T-SQL statement (not shown) to generate the columns list. This could have been done in our BimlScript, but it was something we changed after the fact to accommodate the limitations of Polybase (Dear Microsoft: Please fix). SQL was quicker and easier for us, but if I were to do this again I would add that into our BimlScript. We needed to replace new lines and double quotes in our data before we could read it in from the data lake.  You can get around this issue by using .ORC files rather than text delimited files. But the ORC files aren’t human readable, and we felt that was important for adoption of the data lake with the client on this project. They were already jumping in with several new technologies and we didn’t want to add anything else to the stack. So our select statements list out fields and replace the unwanted characters in the string fields.

Our Excel file looks like this.

ADF Biml Metadata

Columns B, C, L, and M are populated by Excel formulas. This is the file that is read in by the BimlScript in the code below.

In our generator file (which is the same file that was used to generate the datasets), we use the CallBimlScript function to call the pipeline template file and pass along the required properties (table, schema, frequency, scope, columns list, predicate).

The great thing about Biml is that I can use it as much or as little as I feel is helpful. That T-SQL statement to get column lists could have been Biml, but it didn’t have to be. The client can maintain and enhance these pipelines with or without Biml as they see fit. There is no vendor lock-in here. Just as with Biml-generated SSIS projects, there is no difference between a hand-written ADF solution and a Biml-generated ADF solution, other than the Biml-generated solution is probably more consistent.

And have I mentioned the time savings? There is a reason why Varigence gives out shirts that say “It’s Monday and I’m done for the week.”

We made changes and regenerated our pipelines a few times, which would have taken hours without Biml. With Biml, it was no big deal.

Thanks to Levi for letting me share some of his code, and for working with me on this project!

Azure, Azure Data Factory, Biml, Data Warehousing, Microsoft Technologies

Copying data from On Prem SQL to ADLS with ADF and Biml – Part 1

Note: This post is about Azure Data Factory V1
Apologies for the overly acronym-laden title as I was trying to keep it concise but descriptive. And we all know that adding technologies to your repertoire means adding more acronyms.

My coworker Levi and I are working on a project where we copy data from an on-premises SQL Server 2014 database and land it in Azure Data Lake Store. Then we use Polybase to get the data into Azure SQL Data Warehouse and build a dimensional model. I’ve done a couple of small projects before with Azure Data Factory, but nothing as large as this one. We had 173 tables that we needed to copy to ADLS. Then we needed to set up incremental loads for 95 of those tables going forward.

My Azure Data Factory is made up of the following components:

  • Gateway – Allows ADF to retrieve data from an on premises data source
  • Linked Services – define the connection string and other connection properties for each source and destination
  • Datasets – Define a pointer to the data you want to process, sometimes defining the schema of the input and output data
  • Pipelines – combine the data sets and activities and define an execution schedule

Each of these objects is defined in a JSON file. Defining data sets and copy activities in JSON gets very tedious, especially when you need to do this for 100+ tables. Tedium usually indicates a repeatable pattern. If there is a repeatable pattern you can probably automate it. The gateway and linked services are one-time setup activities that weren’t worth automating for this project, but the datasets and pipelines definitely were.

In order to automate the generation of datasets and pipelines, we need a little help with some metadata. We had the client help us fill out an Excel spreadsheet that listed each table in our source database and the following characteristics relevant to the load to Azure:

  • Frequency (daily or hourly)
  • Changes Only (incremental or full load)
  • Changed Time Column (datetime column used for incremental loads)

That list plus the metadata we retrieved from SQL server for each table (column names and data types) were all we needed to automate the creation of the ADF datasets and pipelines with BimlScript.

This post will show how we built the data sets. The following post will show the pipelines with the copy activities.

First we need to generate the input datasets coming from SQL Server. We added some properties at the top and embedded some code nuggets to handle the values that are specific to each table.

Next we need the output datasets for Azure Data Lake Store. We use the same three properties in generating each dataset- schema, table, frequency- and we add one more for scope.

Now we just need another BimlScript file that calls these two files. We broke our pipelines up into daily versus hourly and incremental versus full loads.

We used a helper code file and a separate environments file, which I’m glossing over so we can focus on the Biml for the ADF assets.  You can see that we read in the inputs from Excel and write some counts to a log file, just to make sure everything is working as intended. Starting on line 41 is where we generate the datasets. On lines 54 and 55, we use the CallBimlScript function to call the two files above. We end up generating datasets for the tables that are a full load each day and their counterpart datasets for the files we create in ADLS. The datasets for daily incremental loads are generated on lines 69 and 70. Then we do the hourly full loads and hourly incremental loads.  I’ll discuss lines 100 – 119 in my next post.

The Results

We were able to write the BimlScript and generate the datasets and pipelines in about 35 hours. A previous ADF project without automation took about 3 hours per source table. If we had gone that route, we could have been looking at 350 – 500 hours to complete this part of the project. Visual Studio with Biml Express took about 5 minutes to generate everything. Deploying to Azure took about an hour. We are now looking into ARM templates for future deployments.

Stay tuned for part 2 where I show how we generated the ADF pipelines.

Biml, Microsoft Technologies, SSIS

BimlScript – Get to Know Your Control Nuggets

This is post #2 of my BimlScript – Get to Know Your Code Nuggets series. To learn about text nuggets, see my first post.  

The next type of BimlScript code nugget I’d like to discuss is the control nugget. Control nuggets allow you to insert control logic to determine what Biml is generated and used to create your BI artifacts (packages, cubes, etc.).  Control nuggets can be used to:

  • Add conditional logic so that specified Biml fragments are only generated when certain criteria are met
  • Add loops to repeat Biml fragments multiple times
  • Define variables that will be used later
  • Access external data from databases, flat files or other sources that will be used in the generation of the final Biml code

Control nuggets start with <# and end with #>. Just like text nuggets, control nuggets can be a single line or multiple lines. And they can contain simple or complex logic.

There were actually control nuggets in the text nugget example from the previous post. The variable declarations at the top of the file (lines 4 – 10) are control nuggets.

Below is another example file that uses code nuggets. I like to separate my design patterns. project connections, and package generation into separate files. The BimlScript shown below is from a slightly altered “caller file”, the file that I would execute to create the SSIS packages that should follow a specified design pattern. This caller file is generating Type 1 slowly changing dimensions.

In my Biml framework, I store the data I need in order to generate my packages in SQL tables. You can see the control nugget in lines 14 – 40 retrieving data from my database and storing it in a data table for later use.

On line 46, there is a control nugget containing a for each loop. For each row in the data table, it calls a BimlScript file that creates a package that uses my Type 1 SCD design pattern with the variables from data table. Notice that the end curly brace for my loop is in a separate control nugget.

There are many different ways to use control nuggets that aren’t covered here.  From these two examples you can start to see how I might combine text nuggets and control nuggets to automate my SSIS package creation and employ consistent design patterns. But there are still a few missing pieces that will be filled in when I cover the remaining code nugget types.

Biml, Microsoft Technologies, SSIS

BimlScript – Get to Know Your Code Nuggets

gold nuggetsIn BimlScript, we embed nuggets of C# or VB code into our Biml (XML) in order to replace variables and automate the creation of our BI artifacts (databases, tables, SSIS packages, SSAS cubes, etc.). Code nuggets are a major ingredient in the magic sauce that is meta-data driven SSIS development using BimlScript.

There are 5 different types of code nuggets in BimlScript:

  • Text nuggets
  • Control nuggets
  • Class nuggets
  • Comments
  • Directives

Over the next several posts I’ll cover each type of code nugget and provide examples of their use.

Text Nuggets

Text nuggets evaluate the expression they contain and then replace the text nugget with the string representation of the value of the expression. I use them often to switch out names of packages, tasks, and components as well as source and destination tables in SSIS development when creating packages based upon a design pattern.

Text nuggets start with <#= and end with #>. Notice there is an equals sign at the beginning of the text nugget but not at the end.

Text nuggets are very useful. You can include complex business logic in the expressions. And the expression result can be any data type. The BimlScript compiler will automatically convert it to text before replacing the code nugget with the result. Like all code nuggets, text nuggets can be a single line or multiple lines.

Let’s take a look at some examples.

Note: The Biml script below has variables, connections, and a package all in one file for ease of demonstration. Normally I would split these out into separate files. 

This BimlScript creates an SSIS package that truncates and reloads data in a single table.

Notice that you can use the text nuggets alone, together, or with literal text.

The <#=PackageName#> text nugget simply replaces the value for the name of the package with the current value of the PackageName variable I set at the top of the file.

The DataFlow Task name is a combination of text and a code nugget.

<Dataflow Name="DFT Stage_<#=DestinationTable#>">

This comes in handy if you would like to enforce certain naming standards. My personal preference is to use naming conventions suggested by Jamie Thomson, so I build them into my Biml and then use text nuggets as shown above to change the rest of the name to reflect the nature of the current task.

The truncate table statement in the Execute SQL Task contains text and two code nuggets that reference a variable and use a function to convert the variable value into all uppercase text:

Truncate Table [<#=DestinationSchema.ToUpper()#>].[<#=DestinationTable.ToUpper()#>];

Hopefully you can see how text nuggets are useful when creating multiple packages at the same time. They allow you to switch out the values that change for each package and create consistent naming standards.

To get to know other types of code nuggets, see Get To Know Your Control Nuggets.

Biml, Conferences, SSIS

Bimling in the Northeast

I’m expanding my experiences to speak at different SQL Saturdays this year, and I’m very excited to say that I will be speaking at SQLSaturday Boston on March 19th and SQLSaturday Maine on June 4th.

My session at both SQL Saturdays will focus on using BimlScript to create good ETL patterns. SSIS has been around for a while now, but I see people making the same mistakes that are easily resolved. One of the biggest mistakes is simple inconsistency. BimlScript supports consistency, and forces the developer to (when used correctly) recognize patterns and reuse them. This helps us to solve (and share the solution to) known problems, make it easier for junior developers to solve those problems and learn why our solution works, and move on to new and more interesting challenges. And if we find a flaw in our pattern, it’s a quick update to the Biml pattern and regeneration of packages to make the change to all applicable packages.

Everyone involved in creating and supporting your SSIS projects can appreciate consistent design patterns. It makes your patterns more tested (by reuse in multiple situations) and reduces the learning curve for development and support.  BimlScript plus a good ETL framework facilitates quicker start and completion of SSIS projects. As a consultant, I appreciate the consistency when I take over a project (usually, I’m working on phase 2 when someone else built phase 1) and already know how it works. And if the framework is slightly different from what I’m used to (there are valid reasons to build things differently on occasion),  I can look at the Biml from the previous work and easily read what’s going on. Have you ever tried to read the XML code of an SSIS package? It’s ugly. Biml is much easier to read because it doesn’t include position information for the UI and many other attributes that aren’t helpful. And because the code is all on one page, I have less chance of missing a setting that was hiding in the advanced properties somewhere in an existing package.

If you are an SSIS architect and you aren’t familiar with Biml, I urge you to look into it. BimlScript.com has a great learning plan to help you get started and become comfortable using Bimlscript.  In my opinion, it is currently the best (and in many cases free!) way to create your SSIS framework and design patterns.

BIDS Helper, Biml, Data Warehousing, Microsoft Technologies, SQL Server, SSIS

Type 6 or Hybrid Type 2 Slowly Changing Dimension with Biml

In my previous post, I provided the design pattern and Biml for a pure Type 2 Slowly Changing Dimension (SCD). When I say “pure Type 2 SCD”, I mean an ETL process that adds a new row for a change in any field in the dimension and never updates a dimension attribute without creating a new row.  In practice, I tend to create more hybrid Type 2 SCDs where updates to some attributes require a new row and others update the value on the existing rows. A similar pattern that I find I implement more often than a pure Type 2 is a Type 6 SCD. A Type 6 SCD builds on the Type 2 technique by adding current attributes alongside the historical attributes so related measures can be grouped by the historical or current dimension attribute values. The only difference between what I call a hybrid Type 2 and a Type 6 is that in the Type 6, there are no Type 1 attributes in the dimension that do not also have a Type 2 version in the dimension to capture the historical values.

Design Pattern

Control Flow

If you are comfortable with my design pattern for a pure Type 2 SCD in which a change of value in any column causes a new row, this pattern is quite similar. And the control flow is exactly the same. This pattern, as with my pure Type 2, assumes that rows are not deleted from the source system. You could easily modify this to check for deleted rows if needed.

Control Flow for a Hybrid Type 2 or Type 6 Dimension
Control Flow for a Hybrid Type 2 or Type 6 Dimension

The steps in the control flow are:

  1. Log package start.
  2. Truncate the update table.
  3. Execute the data flow.
  4. Execute the update statements to update columns and insert new rows.
  5. Log package end.

The update statements are different in this pattern, and I’ll explain those in detail below.

Data Flow

The data flow looks like a pure Type 2 SCD, with the exception of an added derived column transformation and minor changes to the lookup and conditional split. Again, I use views to integrate the data, apply business logic, and add hashkeys for change detection. Then I use SSIS to perform the mechanics of loading the data.

The steps in this data flow task are:

  1. Retrieve data from my source view.
  2. Count the rows for package logging purposes.
  3. Perform a lookup to see if the entity already exists in the dimension table.
  4. If the entity doesn’t exist at all in the dimension table, it goes into the left path where I count the number of rows, add a derived column that sets the row start date to “01/01/1900 00:00:00”, and then insert the row into the dimension table.
  5. If the entity does exist in the table, I check it for changes.
  6. If there are changes to the entity, I count the number of rows, us a derived column to flag the type(s) of changes to make, and then insert the row into an update table.
  7. Entities with no changes are simply counted for audit purposes.

The Source View

This SSIS pattern requires 3 hashed values for for change detection:

  • HistoricalHashKey: the unique identifier of the entity, the natural key that ties the historical rows together
  • ChangeHashKey: the columns on the dimension that cause a new row to be created and the current row to be expired
  • UpdateHashKey: the columns on the dimension that should be updated in place

In my example view below, the Route ID and Warehouse identify a unique route.  The supervisor, route description and route type are all important attributes of the route.  The route area identifies the metro area in which a route is located. If this should change, there is no need to see previous values; we just want to see the current value on every row.

 CREATE View [dbo].[StgDimRoute] as   
    with [Routebase] as   
    (   SELECT [RouteID]  
    , [RouteDescription]  
    , [Supervisor]  
    , [RouteType]  
    , [Warehouse]  
    , [RouteArea]  
     , 1 as RowIsCurrent     
  FROM Stg.Routes R  
  Left JOIN dbo.StgWarehouse W ON W.WarehouseID = R.WarehouseID  
  UNION   
  select -1 as RouteID, 'Unknown' as RouteDescription, 'Unknown' as Supervisor,   
  'Unknown' as RouteType, 'Unknown' as Warehouse, 'Unknown' as RouteArea, 
  1 as RowIsCurrent 
      ) ,  
 Routedata as   
 (  
 Select RouteID, RouteDescription, Supervisor, RouteType, Warehouse, 
   RouteArea, RowIsCurrent 
  , CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT([RouteID], ' ',Warehouse ))) 
      AS HistoricalHashKey  
  , CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT(Supervisor, [RouteDescription], 
      RouteType))) AS ChangeHashKey  
  , Convert(VARBINARY(20), HASHBYTES('MD5', CONCAT(Area,' '))) AS UpdateHashKey  
  from Routebase sb  
  )   
 Select RouteID, RouteDescription, Supervisor, RouteType, Warehouse,  
  RouteArea, RowIsCurrent  
  , HistoricalHashKey, ChangeHashKey, UpdateHashKey  
  , CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII  
  , CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII  
  , CONVERT(VARCHAR(34), UpdateHashKey, 1) AS UpdateHashKeyASCII  
  , dateadd(MINUTE, -1, current_timestamp) as RowEndDate
  , CURRENT_TIMESTAMP as RowStartDate   
  from Routedata ds  

The RowEndDate value in this view will be used for routes that require a current row to be expired since my pattern is the leave the end date of the current row null.

The Change Detection Lookup

The lookup in my DFT compares the HistoricalHashKeyASCII column from the source view with the varchar version of the HistoricalHashKey from the dimension table and adds two lookup columns: lkp_ChangeHashKeyASCII and lkp_UpdateHashKeyASCII to the data set.

Type 6 historical hash lookup

Rows that do not match are new rows; i.e., that route has never been in the dimension table before.  Rows that do match have a row in the dimension table and will then be evaluated to see if there are any changes in the values for that route.

Derived Column for New Rows

The no match output of the lookup are new rows for routes that are not in the dimension table. Since this is the first row in the table for that route, we want this row to be effective from the beginning of time until the end of time. The beginning of time in this data mart is 01/01/1900. Since the data is loaded multiple times per day, I use a date/time field rather than a date. If you only need the precision of a day, you can cut your row end date/time back to just a date. In my pattern, the current row has a null row end date, but you could easily add a derived column to set the end date to 12/31/9999 if you prefer.

Derived Column for Start Date

Conditional Split for Change Detection

This time, I have to check to see if both the ChangeHashKeyASCII and the UpdateHashKeyASCII match in my conditional split.

Type 6 CSPL

If both hashed values from the source view match the hashed values from the dimension table, then no changes are required and the row is simply counted for audit purposes.

If either hashed value doesn’t match, there is an update to be made.

Derived Column to Identify Change Types

We again compare the UpdateHashKeyASCII value from the source view with that of the dimension. If they don’t match, we set the UpdateInPlace flag to true. If the ChangeHashKeyASCII values don’t match, we set the UpdateNewRow flag to true. If a row has both types of changes, both types of updates will be made.

My update table contains the UpdateInPlace and UpdateNewRow columns, so I can reference these flags in my update statements.

The Update Statements

The update statements in the control flow take the changes from the update table and apply them to the dimension table. Three statements are executed in the Execute SQL Statement labeled SQL Update DimRoute.

 UPDATE A   
 SET  A.RouteArea = UA.RouteArea,   
      A.UpdateHashKey = UA.UpdateHashKey,   
      A.UpdtDtTm = CURRENT_TIMESTAMP   
 FROM  Updt.UpdtRoute AS UA   
     JOIN Dbo.DimRoute AS A   
      ON UA.HistoricalHashKey = A.HistoricalHashKey   
 WHERE UpdateInPlace = 1; 
  
UPDATE A   
 SET  RowIsCurrent = 0,   
     A.RowEndDate = UA.RowEndDate,   
     A.UpdtDtTm = CURRENT_TIMESTAMP   
 FROM  Updt.UpdtRoute UA   
     JOIN Dbo.DimRoute AS A   
      ON UA.HistoricalHashKey = A.HistoricalHashKey   
 WHERE A.RowIsCurrent = 1   
     AND UA.UpdateNewRow = 1;   

 INSERT INTO Dbo.DimRoute   
       (RouteID,   
        RouteDescription,   
        Supervisor,   
        RouteType,   
        Warehouse,   
        RouteArea,   
        RowIsCurrent,   
        RowStartDate,   
        RowEndDate,   
        HistoricalHashKey,   
        ChangeHashKey,   
        UpdateHashKey)   
 SELECT RouteID,   
        RouteDescription,   
        Supervisor,   
        RouteType,   
        Warehouse,   
        RouteArea,   
        1,   
        RowStartDate,   
        null,   
        HistoricalHashKey,   
        ChangeHashKey,   
        UpdateHashKey
FROM Updt.UpdtRoute WHERE UpdateNewRow = 1; 

The first statement updates the values for the columns designated to be updated in place by joining the update table to the dimension table based on the HistoricalHashKey column. This is the same as performing updates in a Type 1 SCD.

The second statement expires all the rows for which a new row will be added. The third statement inserts the new rows with the RowIsCurrent value set to 1 and the RowEndDate set to null.

The Biml

If you are using Biml, you know that you can create a design pattern for this type of dimension load and reuse it across multiple projects. This speeds up development and ensures that your Type 2 Hybrid or Type 6 dimensions are implemented consistently.

As usual, I have 3 Biml files that are used to create the SSIS package:

  • ProjectConnections.biml – contains all the project-level connections for the project
  • Dim2Hybrid.biml – contains the SSIS design pattern with code nuggets that parameterize it to make it reusable
  • CreateDim2HybridPackages.biml – calls Dim2Hybrid.biml and passes along the values to be used for each package

I’ve pasted Dim2Hybrid and CreateDim2Hybrid below.

Dim2Hybrid.Biml

 <#@ template language="C#" tier="2" #>  
 <#@ property name="PackageName" type="String" #>  
 <#@ property name="DstSchemaName" type="String" #>  
 <#@ property name="DstTableName" type="String" #>  
 <#@ property name="DstConnection" type="String" #>  
 <#@ property name="DataFlowSourceName" type="String" #>  
 <#@ property name="SrcConnection" type="String" #>  
 <#@ property name="SourceQuery" type="String" #>     
 <#@ property name="UpdateSchemaName" type="String" #>     
 <#@ property name="UpdateTableName" type="String" #>     
 <#@ property name="UpdateConnection" type="String" #>        
 <#@ property name="UpdateSQLStatement" type="String" #>      
     <Package Name="<#=PackageName#>" Language="None">  
   <Parameters>  
     <Parameter DataType="String" Name="ParentPackageID">00000000-0000-0000-0000-000000000000</Parameter>  
   </Parameters>  
   <Variables>  
     <Variable EvaluateAsExpression="true" DataType="String" IncludeInDebugDump="Exclude" Name="QualifiedTableSchema">"[" + @[User::SchemaName] + "].[" + @[User::TableName] + "]"</Variable>  
     <Variable DataType="String" IncludeInDebugDump="Exclude" Name="QueryAuditStart">EXECUTE [dbo].[PackageControlStart] @PackageName = ?, @PackageId = ?, @ParentPackageId = ?, @ExecutionId = ?, @StartTime = ?;</Variable>  
     <Variable DataType="String" IncludeInDebugDump="Exclude" Name="QueryAuditUpdate">EXECUTE [dbo].[PackageControlStop] @PackageId = ?, @ExecutionId = ?, @InsertRowQuantity = ?, @UpdateRowQuantity = ?, @UnchangedRowQuantity=?;</Variable>  
     <Variable DataType="Int32" IncludeInDebugDump="Include" Name="RowCountChanged">0</Variable>  
     <Variable DataType="Int32" IncludeInDebugDump="Include" Name="RowCountNew">0</Variable>  
     <Variable DataType="Int32" IncludeInDebugDump="Include" Name="RowCountSource">0</Variable>  
     <Variable DataType="Int32" IncludeInDebugDump="Include" Name="RowCountUnchanged">0</Variable>   
     <Variable DataType="String" IncludeInDebugDump="Exclude" Name="SchemaName"><#=DstSchemaName#></Variable>  
     <Variable DataType="String" IncludeInDebugDump="Exclude" Name="TableName"><#=DstTableName#></Variable>  
   </Variables>  
   <Tasks>    
     <Dataflow Name="DFT_Insert<#=DstTableName#>">  
       <Transformations>  
         <RowCount Name="CNT_Changed_Rows" VariableName="User.RowCountChanged">  
           <InputPath OutputPathName="CSPL Check For Changes.ChangedRows" />  
     </RowCount>  
         <ConditionalSplit Name="CSPL Check For Changes">  
           <InputPath OutputPathName="LKP Historical Key.Match" />  
           <OutputPaths>  
             <OutputPath Name="ChangedRows">  
               <Expression>(ChangeHashKeyASCII != lkp_ChangeHashKeyASCII) || (UpdateHashKeyASCII != lkp_UpdateHashKeyASCII)</Expression>  
     </OutputPath>  
   </OutputPaths>  
     </ConditionalSplit>  
         <RowCount Name="CNT_New_Rows" VariableName="User.RowCountNew">  
           <InputPath OutputPathName="LKP Historical Key.NoMatch" />  
     </RowCount>  
       <DerivedColumns Name="Identify Change Types">  
           <InputPath SsisName="Derived Column Input" OutputPathName="CNT_Changed_Rows.Output" />  
           <Columns>  
             <Column Name="UpdateInPlace" DataType="Boolean">UpdateHashKeyASCII != lkp_UpdateHashKeyASCII ? (DT_BOOL)1 : (DT_BOOL)0</Column>  
             <Column Name="UpdateNewRow" DataType="Boolean">ChangeHashKeyASCII != lkp_ChangeHashKeyASCII ? (DT_BOOL)1 : (DT_BOOL)0</Column>  
   </Columns>  
     </DerivedColumns>  
         <Lookup Name="LKP Historical Key" NoMatchBehavior="RedirectRowsToNoMatchOutput" OleDbConnectionName="<#=DstConnection#>">  
             <DirectInput>SELECT  
   CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII  
 , CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII  
  ,CONVERT(VARCHAR(34), UpdateHashKey, 1) as UpdateHashkeyAscii  
 FROM  
  <#=DstSchemaName#>.<#=DstTableName#>  
     where rowiscurrent = 1  
  </DirectInput> 
           <Parameters>  
             <Parameter SourceColumn="HistoricalHashKeyASCII" />  
   </Parameters>  
            <ParameterizedQuery>select * from (SELECT  
  CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII  
 , CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII  
  ,CONVERT(VARCHAR(34), UpdateHashKey, 1) as UpdateHashkeyAscii  
 FROM  
  <#=DstSchemaName#>.<#=DstTableName#>  
     where rowiscurrent = 1) [refTable]  
 where [refTable].[HistoricalHashKeyASCII] = ?</ParameterizedQuery>  
            <InputPath OutputPathName="CNT_Source_Rows.Output" />  
           <Inputs>  
             <Column SourceColumn="HistoricalHashKeyASCII" TargetColumn="HistoricalHashKeyASCII" />  
   </Inputs>  
           <Outputs>  
             <Column SourceColumn="ChangeHashKeyASCII" TargetColumn="lkp_ChangeHashKeyASCII" />  
                   <Column SourceColumn="UpdateHashkeyAscii" TargetColumn="lkp_UpdateHashKeyASCII" />  
   </Outputs>  
     </Lookup>  
         <DerivedColumns Name="NewItemRowStartDate">  
        <InputPath OutputPathName="CNT_New_Rows.Output" />  
          <Columns>  
         <Column ReplaceExisting="true" Name="RowStartDate" DataType="DateTime">  
          (DT_DBTIMESTAMP)"01/01/1900 00:00:00"  
         </Column>  
        </Columns>  
       </DerivedColumns>  
            <OleDbDestination Name="OLE_DST_New_Rows" ConnectionName="<#=DstConnection#>">  
           <InputPath OutputPathName="NewItemRowStartDate.Output" />  
            <ExternalTableOutput Table="<#=DstSchemaName#>.<#=DstTableName#>" />   
                <Columns>  
                   <Column SourceColumn="RowEndDate" IsUsed="false"/>  
                </Columns>  
     </OleDbDestination>  
         <RowCount Name="CNT_Source_Rows" VariableName="User.RowCountSource">  
           <InputPath OutputPathName="<#=DataFlowSourceName#>.Output" />  
     </RowCount>  
         <OleDbSource Name="<#=DataFlowSourceName#>" ConnectionName="<#=SrcConnection#>"> 
                <DirectInput><#=SourceQuery#></DirectInput>  
     </OleDbSource>  
         <RowCount Name="CNT_Unchanged_Rows" VariableName="User.RowCountUnchanged">  
           <InputPath OutputPathName="CSPL Check For Changes.Default" />  
     </RowCount>  
         <OleDbDestination Name="OLE_DST Update Table" ConnectionName="<#=DstConnection#>">   
           <InputPath OutputPathName="Identify Change Types.Output" />  
            <ExternalTableOutput Table="[<#=UpdateSchemaName#>].[<#=UpdateTableName#>]" />  
     </OleDbDestination>  
   </Transformations>  
        <PrecedenceConstraints>  
           <Inputs>  
              <Input OutputPathName="SQL Truncate <#=UpdateTableName#>.Output" />   
           </Inputs>  
        </PrecedenceConstraints>  
     </Dataflow>  
     <ExecuteSQL Name="SQL Begin Audit" ConnectionName="Audit">  
       <VariableInput VariableName="User.QueryAuditStart" />  
       <Parameters>  
         <Parameter Name="0" VariableName="System.PackageName" DataType="String" Length="-1" />  
         <Parameter Name="1" VariableName="System.PackageID" DataType="Guid" Length="-1" />  
         <Parameter Name="2" VariableName="ParentPackageID" DataType="Guid" Length="-1" />  
         <Parameter Name="3" VariableName="System.ServerExecutionID" DataType="Int64" Length="-1" />  
         <Parameter Name="4" VariableName="System.StartTime" DataType="Date" Length="-1" />  
   </Parameters>  
     </ExecuteSQL>  
     <ExecuteSQL Name="SQL Close Audit" ConnectionName="Audit">  
       <VariableInput VariableName="User.QueryAuditUpdate" />  
       <Parameters>  
         <Parameter Name="0" VariableName="System.PackageID" DataType="Guid" Length="-1" />  
         <Parameter Name="1" VariableName="System.ServerExecutionID" DataType="Int64" Length="-1" />  
         <Parameter Name="2" VariableName="User.RowCountNew" DataType="Int32" Length="-1" />  
         <Parameter Name="3" VariableName="User.RowCountChanged" DataType="Int32" Length="-1" />  
         <Parameter Name="4" VariableName="User.RowCountUnchanged" DataType="Int32" Length="-1" />  
   </Parameters>  
        <PrecedenceConstraints>  
           <Inputs>  
              <Input OutputPathName="SQL Update <#=DstTableName#>.Output" />  
           </Inputs>  
        </PrecedenceConstraints>  
     </ExecuteSQL>    
     <ExecuteSQL Name="SQL Truncate <#=UpdateTableName#>" ConnectionName="<#=UpdateConnection#>">  
       <DirectInput>Truncate Table [<#=UpdateSchemaName#>].[<#=UpdateTableName#>]</DirectInput>  
        <PrecedenceConstraints>  
           <Inputs>  
              <Input OutputPathName="SQL Begin Audit.Output" />  
           </Inputs>  
        </PrecedenceConstraints>  
     </ExecuteSQL>  
     <ExecuteSQL Name="SQL Update <#=DstTableName#>" ConnectionName="<#=DstConnection#>">  
          <DirectInput><#=UpdateSQLStatement#></DirectInput>  
        <PrecedenceConstraints>  
           <Inputs>  
              <Input OutputPathName="DFT_Insert<#=DstTableName#>.Output" /> 
           </Inputs>  
        </PrecedenceConstraints>  
     </ExecuteSQL>  
   </Tasks>  
 </Package>  

Create Dim2Hybrid.Biml

 

 <#@ template language="C#" hostspecific="true" #>  
 <Biml xmlns="http://schemas.varigence.com/biml.xsd">  
 <#@ include file="ProjectConnection.biml" #>  
 <!--  
    <#  
    string PackageName = "LoadDimRoute";  
    string DstSchemaName = "dbo";  
   string DstTableName = "DimRoute";  
    string DstConnection = "AWBIML";     
    string DataFlowSourceName = "OLE_SRC_StgDimRoute";  
    string SrcConnection = "AWBIML";  
    string SourceQuery = @"Select RouteID, RouteDescription, Supervisor, RouteType, Warehouse,   
   RouteArea, RowIsCurrent   
  , HistoricalHashKey, ChangeHashKey, UpdateHashKey   
  , HistoricalHashKeyASCII   
  , ChangeHashKeyASCII   
  , UpdateHashKeyASCII   
  , RowEndDate  
  , RowStartDate    
  FROM [dbo].[StgDimRoute];" ;  
    string UpdateSchemaName = "Updt";     
    string UpdateTableName = "UpdtRoute";  
    string UpdateConnection = "AWBIML";     
 string UpdateSQLStatement = @" UPDATE A    
  SET A.RouteArea = UA.RouteArea,    
    A.UpdateHashKey = UA.UpdateHashKey,    
    A.UpdtDtTm = CURRENT_TIMESTAMP    
  FROM Updt.UpdtRoute AS UA    
    JOIN Dbo.DimRoute AS A    
    ON UA.HistoricalHashKey = A.HistoricalHashKey    
  WHERE UpdateInPlace = 1;   
  UPDATE A    
  SET RowIsCurrent = 0,    
    A.RowEndDate = UA.RowEndDate,    
    A.UpdtDtTm = CURRENT_TIMESTAMP    
  FROM Updt.UpdtRoute UA    
    JOIN Dbo.DimRoute AS A    
    ON UA.HistoricalHashKey = A.HistoricalHashKey    
  WHERE A.RowIsCurrent = 1    
    AND UA.UpdateNewRow = 1;    
  INSERT INTO Dbo.DimRoute    
     (RouteID,    
     RouteDescription,    
     Supervisor,    
     RouteType,    
     Warehouse,    
     RouteArea,    
     RowIsCurrent,    
     RowStartDate,    
     RowEndDate,    
     HistoricalHashKey,    
     ChangeHashKey,    
     UpdateHashKey)    
  SELECT RouteID,    
     RouteDescription,    
     Supervisor,    
     RouteType,    
     Warehouse,    
     RouteArea,    
     1,    
     RowStartDate,    
     null,    
     HistoricalHashKey,    
     ChangeHashKey,    
     UpdateHashKey  
  FROM Updt.UpdtRoute WHERE UpdateNewRow = 1;" ;     
 #>  
 -->  
   <Packages>  
     <#=CallBimlScript("Dim2Hybrid.biml", PackageName, DstSchemaName, DstTableName, DstConnection, DataFlowSourceName, SrcConnection, SourceQuery, UpdateSchemaName, UpdateTableName, UpdateConnection, UpdateSQLStatement)#>  
   </Packages>  
 </Biml>  

Once I have my source view, dimension table, and update table in the database, the 3 Biml files added to my project, and BIDSHelper installed, all I have to do is right click on the CreateDim2Hybrid.Biml file and choose Generate SSIS packages to create my package.

BIDS Helper, Biml, Microsoft Technologies, SSIS

Demystifying the Type 2 Slowly Changing Dimension with Biml

Most data warehouses have at least a couple of Type 2 Slowly Changing Dimensions. We use them to keep history so we can see what an entity looked like at the time an event occurred. From an ETL standpoint, I think Type 2 SCDs are the most commonly over-complicated and under-optimized design pattern I encounter.

There is a Slowly Changing Dimension Transformation built into SSIS, but most people recommend against using it as it isn’t very efficient.  I think many of the people that do use it do so simply because they feel it’s easier than digging in and understanding the operations that need to be done in order to roll your own Type 2 SCD processing.The most common mistake I see in SCD 2 packages, whether using the built-in transformation or creating your own data flow, is that people use OLEDB commands to perform updates one row at a time rather than writing updates to a staging table and performing a set-based update on all rows.  If your dimension is small, the performance from row by row updates may be acceptable, but the overhead associated with using a staging table and performing set-based update will probably be negligible. So why not keep a consistent pattern for all type 2 dimensions and require no changes if the dimension grows?

I am here to encourage you: don’t be scared of slowly changing dimensions. Once you learn the design pattern, you will find they aren’t that difficult to understand. And once you have the Biml, creating them is fairly painless.

My Design Pattern

BimlScript allows me to create a reusable design pattern for SSIS that I can employ for each Type 2 dimension that I create. Despite the need to keep history, my Type 2 SCD doesn’t look that much different from my Type 1 SCD. In fact the control flow is exactly the same.

I start with an audit step to log the beginning of package execution. Then I truncate the staging table used to contain my updates. Next I have the Data Flow Task to insert new rows into the dimension table and updates into my update table. Then I run an Execute SQL Task to perform the updates. Finally I log the end of package execution.

The Data Flow Task looks a like a Type 1 SCD until the end, where I have a Derived Column Transformation before I insert new rows into my dimension table.

As with my Type 1 pattern, the combination and transformation of the data (the business logic) is performed in a view in SQL Server, and the mechanics of capturing the history is performed in SSIS.

The steps in this data flow task are:

  1. Retrieve data from my source view.
  2. Count the rows for package logging purposes.
  3. Perform a lookup to see if the entity already exists in the dimension table (more info on that below).
  4. If the entity doesn’t exist at all in the dimension table, it goes into the left path where I count the number of rows, add a derived column that sets the row start date to “01/01/1900 00:00:00”, and then insert the row into the dimension table.
  5. If the entity does exist in the table, I check it for changes.
  6. If there are changes to the entity, I count the number of rows and then insert the row into an update table.
  7. Entities with no changes are simply counted for audit purposes.

 The Details

Source View with Hashkeys

I’m using product data from the AdventureWorks database for an example.  I create a view based upon my staging table that holds the data from the source system. In addition to providing the data values from the source data, my view does the following:

  • Adds in the unknown row
  • Adds the hashkeys for change detection
  • Sets the row start date
 CREATE VIEW [dbo].[StgProduct2]  
      AS   
with Productbase as (     
 SELECT [productid],   
     [name],   
     [productnumber],   
     [makeflag],   
     [finishedgoodsflag],   
     [color],   
     [safetystocklevel],   
     [reorderpoint],   
     [standardcost],   
     [listprice],   
     [size],   
     [sizeunitmeasurecode],   
     [weightunitmeasurecode],   
     [weight],   
     [daystomanufacture],   
     [productline],   
     [class],   
     [style],   
     [sellstartdate],   
     [sellenddate],   
     [discontinueddate],   
           1 as RowIsCurrent,   
           0 as isDeleted  
  FROM  [Staging].[Product]  
 UNION   
  Select -1 as productid, 'unknown' as name, null as productnumber, null as makeflag, 
   null as finishedgoodsflag, null as color, null as safetystocklevel,   
   null as reorderpoint, null as standardcost, null as listprice, null as size, 
   null as sizeunitmeasurecode, null as weightunitmeasurecode, null as weight,   
   null as daystomanufacture, null as productline, null as class, null as style, 
   null as sellstartdate, null as sellenddate, null as discontinueddate,   
   1 as RowIsCurrent, 0 as isDeleted          
        ) ,  
productdata as   
 (  
 Select [productid], [name], [productnumber], [makeflag], [finishedgoodsflag], [color], 
 [safetystocklevel], [reorderpoint], [standardcost], [listprice], [size], 
 [sizeunitmeasurecode], [weightunitmeasurecode], [weight], [daystomanufacture], 
 [productline], [class], [style], [sellstartdate], [sellenddate], [discontinueddate], 
 rowiscurrent, isdeleted  
  , CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT([ProductID], ' '))) AS HistoricalHashKey  
  , CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT([name], [productnumber], [makeflag], 
 [finishedgoodsflag], [color], [safetystocklevel], [reorderpoint], [standardcost], [listprice], 
 [size], [sizeunitmeasurecode], [weightunitmeasurecode], [weight], [daystomanufacture], 
 [productline], [class], [style], [sellstartdate], [sellenddate], [discontinueddate],
 rowiscurrent, isDeleted ))) AS ChangeHashKey  
  from Productbase sb  
  )   
Select [productid], [name], [productnumber], [makeflag], [finishedgoodsflag], [color], 
 [safetystocklevel], [reorderpoint], [standardcost], [listprice], [size], [sizeunitmeasurecode],  
 [weightunitmeasurecode], [weight], [daystomanufacture], [productline], [class], [style],  
 [sellstartdate], [sellenddate], [discontinueddate], rowiscurrent, isdeleted  
  ,HistoricalHashKey, ChangeHashKey  
  , CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII  
  , CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII  
  , dateadd(MINUTE, -1, current_timestamp) as rowenddate, CURRENT_TIMESTAMP as rowstartdate   
  from productdata ds  

The Change Detection Lookup

The lookup in my DFT compares the HistoricalHashKeyASCII column from the source view with the varchar version of the HistoricalHashKey from the dimension table and adds the lookup column lkp_ChangeHashKeyASCII to the data set.

 

lookup hashkey

Rows that do not match are new rows, as in that product has never been in the dimension table before.  Rows that do match have a row in the dimension table and will then be evaluated to see if there are any changes in the values for that product.

Derived Column for New Rows

The no match output of the lookup are new rows for products that are not in the dimension table. Since this is the first row in the table for that product, we want this row to be effective from the beginning of time until the end of time. The beginning of time in the data warehouse is 01/01/1900. Since I often load the data multiple times per day, I use a date/time field rather than a date. If you only need the precision of a day, you can cut your row end date/time back to just a date. In my pattern, the current row has a null row end date, but you could easily add a derived column to set the end date to 12/31/9999 if you prefer.

Derived Column for Start Date

Conditional Split to Check Existing Products For Changes

For existing products, we only want to update the products for which some attribute has changed. We check for changes using a conditional split. When we performed the lookup against the HistoricalHashKeyASCII value, we added the lookup column lkp_ChangeHashKeyASCII from the dimension table.

The match output from the lookup flows into the conditional split. Then we check to see if there is a difference between the change hash from the source view and the change hash in the dimension table that we have added to our data set.

Conditional Split To Detect Changes

If the change hash values are equal the row hasn’t changed and we can pass it on to get the count of unchanged rows. If the change hash values are not equal, we know some column will have a value to update and we can pass it on to the update staging table.

Update Commands

After the data flow task completes, there is an Execute SQL Task to update the changed rows.  There are two commands in the task:

  1. Set the row end date and row is current flag on all existing rows in the dimension for which there is an update.
  2. Insert the new rows for the changed products.
 UPDATE A   
  SET  RowisCurrent = 0,   
    A.rowenddate = UA.rowenddate,   
    A.updatedttm = CURRENT_TIMESTAMP   
  FROM  [Updt].UpdtProduct2 AS UA   
    JOIN dbo.Product2 AS A   
    ON UA.historicalhashkey = A.historicalhashkey   
  WHERE A.rowiscurrent = 1;   
  
INSERT INTO dbo.Product2   
    ([productid]  
     ,[name]  
     ,[productnumber]  
     ,[makeflag]  
     ,[finishedgoodsflag]  
     ,[color]  
     ,[safetystocklevel]  
     ,[reorderpoint]  
     ,[standardcost]  
     ,[listprice]  
     ,[size]  
     ,[sizeunitmeasurecode]  
     ,[weightunitmeasurecode]  
     ,[weight]  
     ,[daystomanufacture]  
     ,[productline]  
     ,[class]  
     ,[style]  
     ,[sellstartdate]  
     ,[sellenddate]  
     ,[discontinueddate]  
     ,[rowiscurrent]  
     ,[isdeleted]  
     ,[HistoricalHashKey]  
     ,[ChangeHashKey]  
     ,[rowenddate]  
     ,[rowstartdate])   
  SELECT [productid]  
     ,[name]  
     ,[productnumber]  
     ,[makeflag]  
     ,[finishedgoodsflag]  
     ,[color]  
     ,[safetystocklevel]  
     ,[reorderpoint]  
     ,[standardcost]  
     ,[listprice]  
     ,[size]  
     ,[sizeunitmeasurecode]  
     ,[weightunitmeasurecode]  
     ,[weight]  
     ,[daystomanufacture]  
     ,[productline]  
     ,[class]  
     ,[style]  
     ,[sellstartdate]  
     ,[sellenddate]  
     ,[discontinueddate]  
     ,[rowiscurrent]  
     ,[isdeleted]  
     ,[HistoricalHashKey]  
     ,[ChangeHashKey]  
     ,null  
     ,[rowstartdate]   
  FROM  updt.updtProduct2;  

The BimlScript

I have a Dim2.Biml file that just contains the design pattern with code blocks that allow me to pass parameters to it for the particular dimension I’m making. I also have a Biml file that contains the project level connection managers. Then I have a Biml file that obtains the package specific parameters and passes them to the Dim2.Biml file.

ProjectConnections.biml

 <#@ template language="C#" tier="1" #>       
 <Connections>  
  <OleDbConnection Name="AWBIML" ConnectionString ="Data Source=localhost\SQL2014;  
   Initial Catalog=AWBIML;Integrated Security=SSPI;Provider=SQLNCLI11.1;" CreateInProject="true"/>  
  <OleDbConnection Name="Audit" ConnectionString ="Data Source=localhost\SQL2014;  
   Initial Catalog=AWBIML;Integrated Security=SSPI;Provider=SQLNCLI11.1;" CreateInProject="true"/>       
 </Connections>  

The ProjectConnections file doesn’t begin with Biml tags because it gets included in the middle of the CreateDim1.biml file. AWBIML is a local database I created to contain my data mart.

Dim2.biml

This is the file that contains my design pattern.

 <#@ template language="C#" tier="2" #>  
   
 <#@ property name="PackageName" type="String" #>  
 <#@ property name="SourceConnection" type="String" #>  
 <#@ property name="DestinationConnection" type="String" #>  
 <#@ property name="DestinationSchemaName" type="String" #>  
 <#@ property name="DestinationTableName" type="String" #>  
 <#@ property name="DestinationTable" type="String" #>  
 <#@ property name="DataFlowName" type="String" #>  
 <#@ property name="DataFlowSourceName" type="String" #>  
 <#@ property name="DataFlowQuery" type="String" #>       
 <#@ property name="QueryOutputPathName" type="String" #>  
 <#@ property name="DestinationName" type="String" #>  
 <#@ property name="UpdateSchemaName" type="String" #>       
 <#@ property name="UpdateTableName" type="String" #>       
 <#@ property name="UpdateConnection" type="String" #>            
 <#@ property name="UpdateSQLStatement" type="String" #>       
        
        
   <Package Name="<#=PackageName#>" Language="None">  
   <Parameters>  
     <Parameter DataType="String" Name="ParentPackageID">00000000-0000-0000-0000-000000000000</Parameter>  
   </Parameters>  
   <Variables>  
     <Variable EvaluateAsExpression="true" DataType="String" IncludeInDebugDump="Exclude" Name="QualifiedTableSchema">"[" + @[User::SchemaName] + "].[" + @[User::TableName] + "]"</Variable>  
     <Variable DataType="String" IncludeInDebugDump="Exclude" Name="QueryAuditStart">EXECUTE [Audit].[PackageControlStart] @PackageName = ?, @PackageId = ?, @ParentPackageId = ?, @ExecutionId = ?, @StartTime = ?;</Variable>  
     <Variable DataType="String" IncludeInDebugDump="Exclude" Name="QueryAuditUpdate">EXECUTE [Audit].[PackageControlStop] @PackageId = ?, @ExecutionId = ?, @InsertRowQuantity = ?, @UpdateRowQuantity = ?, @UnchangedRowQuantity=?;</Variable>  
     <Variable DataType="Int32" IncludeInDebugDump="Include" Name="RowCountChanged">0</Variable>  
     <Variable DataType="Int32" IncludeInDebugDump="Include" Name="RowCountNew">0</Variable>  
     <Variable DataType="Int32" IncludeInDebugDump="Include" Name="RowCountSource">0</Variable>  
     <Variable DataType="Int32" IncludeInDebugDump="Include" Name="RowCountUnchanged">0</Variable>  
             
     <Variable DataType="String" IncludeInDebugDump="Exclude" Name="SchemaName"><#=DestinationSchemaName#></Variable>  
             
     <Variable DataType="String" IncludeInDebugDump="Exclude" Name="TableName"><#=DestinationTableName#></Variable>  
   </Variables>  
   <Tasks>  
             
     <Dataflow Name="DFT_Insert<#=DestinationTableName#>">  
       <Transformations>  
         <RowCount Name="CNT_Changed_Rows" VariableName="User.RowCountChanged">  
           <InputPath OutputPathName="CSPL Check For Changes.ChangedRows" />  
     </RowCount>  
         <ConditionalSplit Name="CSPL Check For Changes">  
           <InputPath OutputPathName="LKP Historical Key.Match" />  
           <OutputPaths>  
             <OutputPath Name="ChangedRows">  
               <Expression>ChangeHashKeyASCII != lkp_ChangeHashKeyASCII</Expression>  
     </OutputPath>  
   </OutputPaths>  
     </ConditionalSplit>  
         <RowCount Name="CNT_New_Rows" VariableName="User.RowCountNew">  
           <InputPath OutputPathName="LKP Historical Key.NoMatch" />  
     </RowCount>  
         <Lookup Name="LKP Historical Key" NoMatchBehavior="RedirectRowsToNoMatchOutput" OleDbConnectionName="<#=DestinationConnection#>">  
               <DirectInput>SELECT  
  CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII  
 , CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII  
 FROM  
  <#=DestinationSchemaName#>.<#=DestinationTableName#>  
  where RowIsCurrent = 1  
  </DirectInput>  
           <Parameters>  
             <Parameter SourceColumn="HistoricalHashKeyASCII" />  
   </Parameters>  
              <ParameterizedQuery>select * from (SELECT  
  CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII  
 , CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII  
 FROM  
  <#=DestinationSchemaName#>.<#=DestinationTableName#>) [refTable]  
 where [refTable].[HistoricalHashKeyASCII] = ?</ParameterizedQuery>  
              <InputPath OutputPathName="CNT_Source_Rows.Output" />  
           <Inputs>  
             <Column SourceColumn="HistoricalHashKeyASCII" TargetColumn="HistoricalHashKeyASCII" />  
   </Inputs>  
           <Outputs>  
             <Column SourceColumn="ChangeHashKeyASCII" TargetColumn="lkp_ChangeHashKeyASCII" />  
   </Outputs>  
     </Lookup>  
                      <DerivedColumns Name="NewItemRowStartDate">  
        <InputPath OutputPathName="CNT_New_Rows.Output" />  
                <Columns>  
         <Column ReplaceExisting="true" Name="RowStartDate" DataType="DateTime">  
          (DT_DBTIMESTAMP)"01/01/1900 00:00:00"  
         </Column>  
        </Columns>  
       </DerivedColumns>  
         <OleDbDestination Name="OLE_DST_New_Rows" ConnectionName="<#=DestinationConnection#>">  
           <InputPath OutputPathName="NewItemRowStartDate.Output" />  
              <ExternalTableOutput Table="<#=DestinationSchemaName#>.<#=DestinationTableName#>" />   
                               <Columns>  
                               <Column SourceColumn="RowEndDate" IsUsed="false"/>  
                          </Columns>  
     </OleDbDestination>  
         <RowCount Name="CNT_Source_Rows" VariableName="User.RowCountSource">  
           <InputPath OutputPathName="<#=DataFlowSourceName#>.Output" />  
     </RowCount>  
         <OleDbSource Name="<#=DataFlowSourceName#>" ConnectionName="<#=SourceConnection#>">        
                          <DirectInput><#=DataFlowQuery#></DirectInput>  
     </OleDbSource>  
         <RowCount Name="CNT_Unchanged_Rows" VariableName="User.RowCountUnchanged">  
           <InputPath OutputPathName="CSPL Check For Changes.Default" />  
     </RowCount>  
         <OleDbDestination Name="OLE_DST Update Table" ConnectionName="<#=DestinationConnection#>">  
           <InputPath OutputPathName="CNT_Changed_Rows.Output" />  
              <ExternalTableOutput Table="[<#=UpdateSchemaName#>].[<#=UpdateTableName#>]" />  
     </OleDbDestination>  
   </Transformations>  
          <PrecedenceConstraints>  
               <Inputs>  
                    <Input OutputPathName="SQL Truncate <#=UpdateTableName#>.Output" />   
               </Inputs>  
          </PrecedenceConstraints>  
     </Dataflow>  
     <ExecuteSQL Name="SQL Begin Audit" ConnectionName="Audit">  
       <VariableInput VariableName="User.QueryAuditStart" />  
       <Parameters>  
         <Parameter Name="0" VariableName="System.PackageName" DataType="String" Length="-1" />  
         <Parameter Name="1" VariableName="System.PackageID" DataType="Guid" Length="-1" />  
         <Parameter Name="2" VariableName="ParentPackageID" DataType="Guid" Length="-1" />  
         <Parameter Name="3" VariableName="System.ServerExecutionID" DataType="Int64" Length="-1" />  
         <Parameter Name="4" VariableName="System.StartTime" DataType="Date" Length="-1" />  
   </Parameters>  
     </ExecuteSQL>  
     <ExecuteSQL Name="SQL Close Audit" ConnectionName="Audit">  
       <VariableInput VariableName="User.QueryAuditUpdate" />  
       <Parameters>  
         <Parameter Name="0" VariableName="System.PackageID" DataType="Guid" Length="-1" />  
         <Parameter Name="1" VariableName="System.ServerExecutionID" DataType="Int64" Length="-1" />  
         <Parameter Name="2" VariableName="User.RowCountNew" DataType="Int32" Length="-1" />  
         <Parameter Name="3" VariableName="User.RowCountChanged" DataType="Int32" Length="-1" />  
         <Parameter Name="4" VariableName="User.RowCountUnchanged" DataType="Int32" Length="-1" />  
   </Parameters>  
          <PrecedenceConstraints>  
               <Inputs>  
                    <Input OutputPathName="SQL Update <#=DestinationTableName#>.Output" />   
               </Inputs>  
          </PrecedenceConstraints>  
     </ExecuteSQL>  
     <ExecuteSQL Name="SQL Truncate <#=UpdateTableName#>" ConnectionName="<#=UpdateConnection#>">   
       <DirectInput>Truncate Table [<#=UpdateSchemaName#>].[<#=UpdateTableName#>]</DirectInput>  
          <PrecedenceConstraints>  
               <Inputs>  
                    <Input OutputPathName="SQL Begin Audit.Output" />  
               </Inputs>  
          </PrecedenceConstraints>  
     </ExecuteSQL>  
   
     <ExecuteSQL Name="SQL Update <#=DestinationTableName#>" ConnectionName="<#=DestinationConnection#>">  
                <DirectInput><#=UpdateSQLStatement#></DirectInput>  
          <PrecedenceConstraints>  
               <Inputs>  
                    <Input OutputPathName="DFT_Insert<#=DestinationTableName#>.Output" />   
               </Inputs>  
          </PrecedenceConstraints>  
     </ExecuteSQL>  
   </Tasks>  
 </Package>  
   

CreateDim2Packages.Biml

This file simply feeds values into the code nuggets in the Dim2.biml file. In practice, I store these values in a table and change this file to pull the values from a table so when I generate SSIS packages from this file, all my type 2 dimension packages are generated rather than just one.

 <#@ template language="C#" hostspecific="true" #>  
 <#@ import namespace="System.Data" #>  
 <#@ import namespace="System.Data.SqlClient" #>  
 <#@ import namespace="System.IO" #>  
   
 <Biml xmlns="http://schemas.varigence.com/biml.xsd">  
             
 <!--  
      <#  
      string PackageName = "LoadDimProduct2";  
      string SourceConnection = "AWBIML";  
      string DestinationConnection = "AWBIML";       
      string DestinationSchemaName = "dbo";  
      string DestinationTableName = "Product2";  
      string DestinationTable = "dbo.Product2";  
      string DataFlowName = "DFT_Product2";  
      string DataFlowSourceName = "OLE_SRC_StgProduct";  
      string QueryOutputPathName = "OLE_SRC_StgProduct.Output";  
      string DestinationName = "OLE_DST_Product2";  
      string UpdateSchemaName = "Updt";  
      string UpdateTableName = "UpdtProduct2";  
      string UpdateConnection = "AWBIML";  
      string DataFlowQuery = @"SELECT [productid]  
                                     ,[name]  
                                     ,[productnumber]  
                                     ,[makeflag]  
                                     ,[finishedgoodsflag]  
                                     ,[color]  
                                     ,[safetystocklevel]  
                                     ,[reorderpoint]  
                                     ,[standardcost]  
                                     ,[listprice]  
                                     ,[size]  
                                     ,[sizeunitmeasurecode]  
                                     ,[weightunitmeasurecode]  
                                     ,[weight]  
                                     ,[daystomanufacture]  
                                     ,[productline]  
                                     ,[class]  
                                     ,[style]  
                                     ,[sellstartdate]  
                                     ,[sellenddate]  
                                     ,[discontinueddate]  
                                     ,[rowiscurrent]  
                                     ,[isdeleted]  
                                     ,[HistoricalHashKey]  
                                     ,[ChangeHashKey]  
                                     ,[HistoricalHashKeyASCII]  
                                     ,[ChangeHashKeyASCII]  
                                     ,[rowenddate]  
                                     ,[rowstartdate]  
                                FROM [dbo].[StgProduct2];" ;  
           string UpdateSQLStatement = @"UPDATE A   
           SET  RowisCurrent = 0,   
                  A.rowenddate = UA.rowenddate,   
                  A.updatedttm = CURRENT_TIMESTAMP   
           FROM  [Updt].UpdtProduct2 AS UA   
                  JOIN dbo.Product2 AS A   
                      ON UA.historicalhashkey = A.historicalhashkey   
           WHERE A.rowiscurrent = 1;   
   
           INSERT INTO dbo.Product2   
                          ([productid]  
                       ,[name]  
                       ,[productnumber]  
                       ,[makeflag]  
                       ,[finishedgoodsflag]  
                       ,[color]  
                       ,[safetystocklevel]  
                       ,[reorderpoint]  
                       ,[standardcost]  
                       ,[listprice]  
                       ,[size]  
                       ,[sizeunitmeasurecode]  
                       ,[weightunitmeasurecode]  
                       ,[weight]  
                       ,[daystomanufacture]  
                       ,[productline]  
                       ,[class]  
                       ,[style]  
                       ,[sellstartdate]  
                       ,[sellenddate]  
                       ,[discontinueddate]  
                       ,[rowiscurrent]  
                       ,[isdeleted]  
                       ,[HistoricalHashKey]  
                       ,[ChangeHashKey]  
                       ,[rowenddate]  
                       ,[rowstartdate])   
           SELECT          [productid]  
                       ,[name]  
                       ,[productnumber]  
                       ,[makeflag]  
                       ,[finishedgoodsflag]  
                       ,[color]  
                       ,[safetystocklevel]  
                       ,[reorderpoint]  
                       ,[standardcost]  
                       ,[listprice]  
                       ,[size]  
                       ,[sizeunitmeasurecode]  
                       ,[weightunitmeasurecode]  
                       ,[weight]  
                       ,[daystomanufacture]  
                       ,[productline]  
                       ,[class]  
                       ,[style]  
                       ,[sellstartdate]  
                       ,[sellenddate]  
                       ,[discontinueddate]  
                       ,[rowiscurrent]  
                       ,[isdeleted]  
                       ,[HistoricalHashKey]  
                       ,[ChangeHashKey]  
                       ,null  
                       ,[rowstartdate]   
           FROM  updt.updtProduct2;" ;  
        
      #>  
      -->  
                <#@ include file="ProjectConnection.biml" #>  
                            
                     <Packages>  
                          <#=CallBimlScript("Dim2.biml", PackageName, SourceConnection, DestinationConnection, DestinationSchemaName, DestinationTableName, DestinationTable, DataFlowName, DataFlowSourceName, DataFlowQuery, QueryOutputPathName, DestinationName, UpdateSchemaName, UpdateTableName, UpdateConnection, UpdateSQLStatement)#>  
                     </Packages>  
   
 </Biml>  

Get the Biml

You can download the Biml and SQL files necessary to use this pattern here.  This zip file contains:

  • 3 Biml files (ProjectConnections, Dim2, CreateDim2Packages)
  • 1 Word doc with a brief explanation of the files and design pattern
  • 8 SQL files to make the schemas, tables, views, and stored procedures for the product dimension.

This pattern is for a pure Type 2 SCD. I’ll post again soon with a pattern that handles hybrid Type 2 and Type 6 SCDs.

For more info on Biml, see my other Biml posts or check out BimlScript.com.

BIDS Helper, Biml, Data Warehousing, Microsoft Technologies, SSIS, T-SQL

Biml for a Type 1 Slowly Changing Dimension

I’ve been working on building my Biml library over the last few months. One of the first design patterns I created was a Type 1 Slowly Changing Dimension where all fields except the key fields that define the level of granularity are overwritten with updated values.  It assumes I have a staging table, but it could probably be easily modified to pull directly from the source table if needed.

For simplicity, I’m going to create DimSalesReason from the AdventureWorks database as my example. Please pardon the large amount of code in this post. I wanted to provide a fairly complete picture so you can see how the pieces fit together so I’m providing all of the SQL and Biml needed to generate the package.

Required Database Objects

My staging table is a copy of source table and looks like this:

CREATE TABLE [Staging].[SalesReason](
    [SalesReasonID] [int] NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
    [ReasonType] [nvarchar](50) NOT NULL,
    [ModifiedDate] [datetime] NOT NULL
)

Here is the DDL for my dimension table:

CREATE TABLE [dbo].[DimSalesReason](
    [SalesReasonKey] int IDENTITY(1,1) NOT NULL Primary Key
    ,[SalesReasonID] int not null
    ,[SalesReasonName] nvarchar(50) not null
    ,[SalesReasonType] nvarchar(50) not null
    ,[HistoricalHashKey] varbinary(20) NOT NULL
    ,[ChangeHashKey] varbinary(20) NOT NULL
    ,[InsertDttm] datetime not null
    ,[UpdateDttm] datetime NULL
)

You’ll notice I have some audit fields in my table. UpdateDttm is the datetime when the row was last updated. InsertDttm is the datetime when the row was initially inserted. The HistoricalHashKey and ChangeHashKey are used for change detection. I may not need them in a dimension this simple, but I use them in larger/wider dimensions and I like my dimensions to be built consistently (unless they need to be tweaked for performance). The HistoricalHashKey represents the business key of the table that defines the level of uniqueness for the row in the dimension table. The ChangeHashKey represents all the other fields that are used in the dimension row. This makes it easier to determine if values have changed since I can compare just the ChangeHashKey instead of each individual field.

I also have an update table. This allows me to update in bulk rather than updating in place row by row. With a very small table, you might not notice a performance difference, but this pattern becomes much more efficient as the table grows.

CREATE TABLE [Updt].[UpdtSalesReason](
	 [SalesReasonID] int not null
	,[SalesReasonName] nvarchar(50) not null
	,[SalesReasonType] nvarchar(50) not null
	,[HistoricalHashKey] varbinary(20) NOT NULL
	,[ChangeHashKey] varbinary(20) NOT NULL
	,[InsertDttm] datetime not null
)

I use a view to do most of the transformation work (joins, business logic, etc.) for my Type 1 SCD package. Here’s my view for the SalesReasonDimension:

Create View [Staging].[DimSalesReason] as
With SalesReasonData as (
	Select	 [SalesReasonID] 
		,[Name] as [SalesReasonName] 
		,[ReasonType] as [SalesReasonType] 
		,CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT(SalesReasonID, ' '))) 
                                                               AS [HistoricalHashKey]
		,CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT(Name, ' ', ReasonType))) 
                                                                   AS [ChangeHashKey]
		,CURRENT_TIMESTAMP as InsertDttm
		,CURRENT_TIMESTAMP as UpdtDttm
		from 
		[Staging].[SalesReason]
	UNION 
	Select	-1 as [SalesReasonID]
	   , 'Unknown' as [SalesReasonName]
	   , 'Unknown' as [SalesReasonType]
	   ,CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT(-1, ' '))) 
                                                               AS [HistoricalHashKey]
	   ,CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT('Unknown', ' ', 'Unknown')))
                                                               AS [ChangeHashKey]
	   ,CURRENT_TIMESTAMP as InsertDttm
           ,CURRENT_TIMESTAMP as UpdtDttm
)
Select SalesReasonID
	   , SalesReasonName
	   , SalesReasonType
	   , HistoricalHashKey
	   , ChangeHashKey
	   , CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII
	   , CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII 
	   , InsertDttm
	   , UpdtDttm
from SalesReasonData

Using a view to do the transformation work allows me to use the SQL Server engine to do the things it is good at (joins, case statements, conversions) and SSIS to do the things it is good at (controlling the flow of data). It also makes it easy to create a design pattern in Biml that is abstract enough that it easily fits most SCD 1 scenarios. The hashkey fields allow me to do a lookup on one field to determine if the row already exists in the dimension table and a quick comparison to determine if any of the values for that row have changed. I chose to use an MD5 hash because I think it is a good balance of speed/size and collision risk. You may feel differently (others prefer SHA1 due to lower collision risk). On a dimension this size, the difference probably isn’t noticeable.

You can also see that I add my unknown member row in my view. Some people prefer to set the surrogate key of the unknown value to -1. I prefer to set the business key to -1 and let the surrogate key be set to any value in the load process. I’m not a fan of having to turn off the identity insert to add the unknown row. When I do the dimension key lookup for fact tables, I just look for the row where the business key = -1. This also means I don’t have to check my dimension tables to see if someone remembered to insert the unknown rows after deployment to a new environment because I know the unknown rows will be inserted when the package is run.

I have audit steps in my package that write to the Audit.PackageControl table using stored procedures.

CREATE TABLE [Audit].[Package_Control](
	[Package_NM] [varchar](100) NOT NULL,
	[Package_ID] [uniqueidentifier] NOT NULL,
	[Parent_Package_ID] [uniqueidentifier] NULL,
	[Execution_ID] [bigint] NULL,
	[Start_TS] [datetime] NOT NULL,
	[Stop_TS] [datetime] NULL,
	[Insert_Row_QT] [int] NULL,
	[Update_Row_QT] [int] NULL,
	[Unchanged_Row_QT] [int] NULL,
	[Deleted_Row_QT] [int] NULL,
	[Duration_s]  AS (datediff(second,[Start_TS],[Stop_TS])),
	[PackageLogID] [int] IDENTITY(1,1) NOT NULL
)

CREATE PROCEDURE [Audit].[PackageControlStart]
(
    @PackageName varchar(100)
,   @PackageId uniqueidentifier
,   @ParentPackageId uniqueidentifier = NULL
,   @ExecutionId bigint
,   @StartTime DATETIME
,   @StopTime datetime = NULL
,   @InsertRowQuantity int = NULL
,   @UpdateRowQuantity int = NULL
,   @UnchangedRowQuantity int = NULL
)
AS
BEGIN
    SET NOCOUNT ON;
	DECLARE @PackageLogId int 
    INSERT INTO [Audit].[Package_Control]
    (
        [Package_NM]
    ,   [Package_ID]
    ,   [Parent_Package_ID]
    ,   [Execution_ID]
    ,   [Start_TS]
    ,   [Stop_TS]
    ,   [Insert_Row_QT]
    ,   [Update_Row_QT]
    ,	[Unchanged_Row_QT]
    )
    SELECT
        @PackageName 
    ,   @PackageId 
    ,   @ParentPackageId 
    ,   @ExecutionId 
    ,   CURRENT_TIMESTAMP
    ,   @StopTime 
    ,   @InsertRowQuantity 
    ,   @UpdateRowQuantity 
    ,	@UnchangedRowQuantity
  SELECT @PackageLogID = SCOPE_IDENTITY()
  SELECT  @PackageLogID as PackageLogID
END

CREATE PROCEDURE [Audit].[PackageControlStop]
(
    @PackageId uniqueidentifier
,   @ExecutionId bigint
,   @InsertRowQuantity int = NULL
,   @UpdateRowQuantity int = NULL
,	@UnchangedRowQuantity int = NULL
)
AS
BEGIN
    SET NOCOUNT ON;
    -- Close out the execution.
    UPDATE PC
    SET [Stop_TS] = CURRENT_TIMESTAMP  
    ,   [Insert_Row_QT] = @InsertRowQuantity
    ,   [Update_Row_QT] = @UpdateRowQuantity
	,	[Unchanged_Row_QT] = @UnchangedRowQuantity
    FROM  [Audit].[Package_Control] AS PC
    WHERE PC.Package_ID = @PackageId
        AND PC.Execution_ID = @ExecutionId
        AND PC.[Stop_TS] IS NULL;  
END

And Now for the Biml

My Biml library usually contains 3 files for each package type:

  • ProjectConnections – I like to keep my connection managers in a separate file so I only have to update one place if I need to add or change a connection.
  • Dim1 – This contains my actual design pattern.
  • CreateDim1 – This is the Biml file I run to generate the package. It gets separated so I can pull values from databases and pass in variables to my design pattern using BimlScript. For this example I have hardcoded my variables into this file rather than pulling from a database.

ProjectConnections.Biml

<#@ template language="C#" tier="1" #>	
<Connections>
    <OleDbConnection Name="AWBIML" ConnectionString ="Data Source=localhost\SQL2014;
        Initial Catalog=AWBIML;Integrated Security=SSPI;Provider=SQLNCLI11.1;" CreateInProject="true"/>
    <OleDbConnection Name="Audit" ConnectionString ="Data Source=localhost\SQL2014;
        Initial Catalog=AWBIML;Integrated Security=SSPI;Provider=SQLNCLI11.1;" CreateInProject="true"/>	
</Connections>

The ProjectConnections file doesn’t begin with Biml tags because it gets included in the middle of the CreateDim1.biml file. AWBIML is a local database I created to contain my data mart.

Dim1.Biml

<#@ template language="C#" tier="2" #>

<#@ property name="PackageName" type="String" #>
<#@ property name="DstSchemaName" type="String" #>
<#@ property name="DstTableName" type="String" #>
<#@ property name="DstConnection" type="String" #>
<#@ property name="DataFlowSourceName" type="String" #>
<#@ property name="SrcConnection" type="String" #>
<#@ property name="SourceQuery" type="String" #>	
<#@ property name="UpdateSchemaName" type="String" #>	
<#@ property name="UpdateTableName" type="String" #>	
<#@ property name="UpdateConnection" type="String" #>		
<#@ property name="UpdateSQLStatement" type="String" #>	
	
<Package Name="<#=PackageName#>" Language="None">
    <Parameters>
        <Parameter DataType="String" Name="ParentPackageID">00000000-0000-0000-0000-000000000000</Parameter>
    </Parameters>
    <Variables>
        <Variable EvaluateAsExpression="true" DataType="String"  
          Name="QualifiedTableSchema">"["+@[User::SchemaName]+"].["+@[User::TableName]+"]"
        </Variable>
        <Variable DataType="String" Name="QueryAuditStart">
          EXECUTE [Audit].[PackageControlStart] @PackageName=?, @PackageId=?, 
          @ParentPackageId=?,@ExecutionId=?, @StartTime=?;
        </Variable>
        <Variable DataType="String" Name="QueryAuditUpdate">
          EXECUTE [Audit].[PackageControlStop] @PackageId=?, @ExecutionId=?, 
          @InsertRowQuantity=?, @UpdateRowQuantity=?, @UnchangedRowQuantity=?;</Variable> 
        <Variable DataType="Int32"  Name="RowCountChanged">0</Variable>
        <Variable DataType="Int32"  Name="RowCountNew">0</Variable>
        <Variable DataType="Int32"  Name="RowCountSource">0</Variable>
        <Variable DataType="Int32"  Name="RowCountUnchanged">0</Variable>
        <Variable DataType="String" Name="SchemaName"><#=DstSchemaName#></Variable>
        <Variable DataType="String" Name="TableName"><#=DstTableName#></Variable>
    </Variables>
    
    <Tasks>
      <ExecuteSQL Name="SQL Begin Audit" ConnectionName="Audit">
       <VariableInput VariableName="User.QueryAuditStart" />
        <Parameters>
          <Parameter Name="0" VariableName="System.PackageName" DataType="String" Length="-1"/>
          <Parameter Name="1" VariableName="System.PackageID" DataType="Guid" Length="-1"/>
          <Parameter Name="2" VariableName="ParentPackageID" DataType="Guid" Length="-1"/>
          <Parameter Name="3" VariableName="System.ServerExecutionID" DataType="Int64" 
            Length="-1"/>
          <Parameter Name="4" VariableName="System.StartTime" DataType="Date" Length="-1"/>
        </Parameters>
      </ExecuteSQL>

        <ExecuteSQL Name="SQL Truncate <#=UpdateTableName#>" 
          ConnectionName="<#=UpdateConnection#>"> 
            <DirectInput>Truncate Table [<#=UpdateSchemaName#>].[<#=UpdateTableName#>]
            </DirectInput>
          <PrecedenceConstraints>
            <Inputs>
              <Input OutputPathName="SQL Begin Audit.Output" />
            </Inputs>
          </PrecedenceConstraints>
        </ExecuteSQL>

        <Dataflow Name="DFT Insert<#=DstTableName#>">
            
            <Transformations>
            
                <RowCount Name="CNT Changed_Rows" VariableName="User.RowCountChanged">
                    <InputPath OutputPathName="CSPL Check For Changes.ChangedRows" />
                </RowCount>
                
                <ConditionalSplit Name="CSPL Check For Changes">
                    <InputPath OutputPathName="LKP Historical Key.Match" />
                    <OutputPaths>
                        <OutputPath Name="ChangedRows">
                          <Expression>ChangeHashKeyASCII != lkp_ChangeHashKeyASCII</Expression>
                        </OutputPath>
                    </OutputPaths>
                </ConditionalSplit>
                
                <RowCount Name="CNT New_Rows" VariableName="User.RowCountNew">
                    <InputPath OutputPathName="LKP Historical Key.NoMatch" />
                </RowCount>
        
                <Lookup Name="LKP Historical Key" NoMatchBehavior="RedirectRowsToNoMatchOutput" 
                  OleDbConnectionName="<#=DstConnection#>">
                    <DirectInput>SELECT
                      CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII
                      , CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII
                      FROM
                      <#=DstSchemaName#>.<#=DstTableName#></DirectInput>
                   <Parameters>
                     <Parameter SourceColumn="HistoricalHashKeyASCII" />
                   </Parameters>
	           <ParameterizedQuery>select * from (SELECT
                     CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII
                     , CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII
                     FROM
                     <#=DstSchemaName#>.<#=DstTableName#>) [refTable]
                     where [refTable].[HistoricalHashKeyASCII] = ?</ParameterizedQuery>
                  <InputPath OutputPathName="CNT Source_Rows.Output" />
                  <Inputs>
                    <Column SourceColumn="HistoricalHashKeyASCII" 
                      TargetColumn="HistoricalHashKeyASCII" />
                  </Inputs>
                  <Outputs>
                    <Column SourceColumn="ChangeHashKeyASCII" 
                    TargetColumn="lkp_ChangeHashKeyASCII" />
                  </Outputs>
                </Lookup>
       
            <OleDbDestination Name="OLE_DST New_Rows" ConnectionName="<#=DstConnection#>">
              <InputPath OutputPathName="CNT New_Rows.Output" />
	      <ExternalTableOutput Table="<#=DstSchemaName#>.<#=DstTableName#>" /> 
            </OleDbDestination>
            
            <RowCount Name="CNT Source_Rows" VariableName="User.RowCountSource">
              <InputPath OutputPathName="<#=DataFlowSourceName#>.Output" />
            </RowCount>
            
            <OleDbSource Name="<#=DataFlowSourceName#>" ConnectionName="<#=SrcConnection#>">
	      <DirectInput><#=SourceQuery#></DirectInput>
            </OleDbSource>
               
            <RowCount Name="CNT Unchanged_Rows" VariableName="User.RowCountUnchanged">
              <InputPath OutputPathName="CSPL Check For Changes.Default" />
            </RowCount>
                
            <OleDbDestination Name="OLE_DST Update Table" ConnectionName="<#=DstConnection#>"> 
              <InputPath OutputPathName="CNT Changed_Rows.Output" />
	      <ExternalTableOutput Table="[<#=UpdateSchemaName#>].[<#=UpdateTableName#>]" />
            </OleDbDestination>
    
     </Transformations>
    
     <PrecedenceConstraints>
      <Inputs>
          <Input OutputPathName="SQL Truncate <#=UpdateTableName#>.Output" /> 
      </Inputs>
     </PrecedenceConstraints>
    </Dataflow>
            
    <ExecuteSQL Name="SQL Update <#=DstTableName#>" 
    ConnectionName="<#=DstConnection#>">
      <DirectInput><#=UpdateSQLStatement#></DirectInput>
        <PrecedenceConstraints>
          <Inputs>
            <Input OutputPathName="DFT Insert<#=DstTableName#>.Output" /> 
          </Inputs>
        </PrecedenceConstraints>
    </ExecuteSQL>
    
    <ExecuteSQL Name="SQL Close Audit" ConnectionName="Audit">
      <VariableInput VariableName="User.QueryAuditUpdate" />
        <Parameters>
          <Parameter Name="0" VariableName="System.PackageID" DataType="Guid" Length="-1"/>
          <Parameter Name="1" VariableName="System.ServerExecutionID" DataType="Int64"
            Length="-1"/>
          <Parameter Name="2" VariableName="User.RowCountNew" DataType="Int32" Length="-1"/>
          <Parameter Name="3" VariableName="User.RowCountChanged" DataType="Int32" 
            Length="-1"/>
          <Parameter Name="4" VariableName="User.RowCountUnchanged" DataType="Int32" 
            Length="-1"/>
        </Parameters>
        <PrecedenceConstraints>
          <Inputs>
            <Input OutputPathName="SQL Update <#=DstTableName#>.Output" /> 
          </Inputs>
        </PrecedenceConstraints>
    </ExecuteSQL>
				    
  </Tasks>

</Package>

CreateDim1.Biml

<#@ template language="C#" hostspecific="true" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#@ include file="ProjectConnection.biml" #>

	<!--
	<#
	string PackageName  = "LoadDimSalesReason";
	string DstSchemaName = "dbo";
	string DstTableName = "DimSalesReason";
	string DstConnection = "AWBIML";
	string DataFlowSourceName = "OLE_SRC StgDimSalesReason";
	string SrcConnection = "AWBIML";
	string SourceQuery = @"SELECT [SalesReasonID]
					  ,[SalesReasonName]
					  ,[SalesReasonType]
					  ,[HistoricalHashKey]
					  ,[ChangeHashKey]
					  ,[HistoricalHashKeyASCII]
					  ,[ChangeHashKeyASCII]
					  ,[InsertDttm]
					  ,[UpdtDttm]
				  FROM [AWBIML].[Staging].[DimSalesReason]";
	string UpdateSchemaName = "Updt";
	string UpdateTableName = "UpdtSalesReason";
	string UpdateConnection = "AWBIML";
	string UpdateSQLStatement = @"Update d 
					set 
					d.SalesReasonName = u.SalesReasonName,
					d.SalesReasonType = u.SalesReasonType,
					d.changehashkey = u.changehashkey 
					FROM Updt.UpdtSalesReason u 
					inner join dbo.DimSalesReason d
                                        on u.HistoricalHashKey = d.HistoricalHashKey;";
	#>
	-->
	
    <Packages>
        <#=CallBimlScript("Dim1.biml", PackageName, DstSchemaName, DstTableName, DstConnection,
        DataFlowSourceName, SrcConnection, SourceQuery, UpdateSchemaName, UpdateTableName, 
        UpdateConnection, UpdateSQLStatement)#>
    </Packages>
	
</Biml>

Resulting SSIS Package

Executing the CreateDim1.Biml file generates a package called LoadDimSalesReason.dtsx. Here is the control flow:
BIML Dim1 ControlFlow
I start the package by logging the package start in my PackageControl table. Then I truncate my update table to prepare for the new data load. Next I have a data flow task, which inserts data into either the dbo.DimSalesReason table or the Updt.UpdtSalesReason table (see below). The SQL Update task updates the DimSalesReason table with the rows that were inserted into UpdtSalesReason. And finally, I log my package completion.

Here’s my data flow:
BIML Dim1 DataFlow
First, I retrieve my data from my staging view. Then I count the number of rows coming from my source so I can log it in my PackageControl table. The lookup on the HistoricalHashKey field sends rows with no match to be inserted into the dimension table. If the rows have a match, they are checked to see if their ChangeHashKey values match the value of the ChangeHaskey in the existing row in the dimension table. If ChangeHashKey values match, the row is counted and nothing else is done. If the ChangeHashKey values don’t match, the row is counted and written to the update table.

And that is all you need to create a Type 1 dimension load with Biml.