Microsoft Technologies, SSIS

Using a tnsnames.ora file with the Microsoft Connector for Oracle in SSIS

One of the nice things about the Microsoft Connector for Oracle is that it doesn’t require installation of an Oracle client. But because of this, you may not have the expected settings and files on the computer where your SSIS package is running.

A client ran into this recently, and the answer was to create a system environment variable.

Although you can now specify an Oracle database using the EzConnect format, it’s still very common to use a tnsnames.ora file. This file specifies a connection name and connection details for an Oracle database. It’s helpful in that you can provide a “friendly name” to the database specified in the file and then reference the file from multiple client tools.

When using a tnsnames.ora file, the connector needs to know the location of the folder containing that file. This location can be specified by a system environment variable or a registry key.

If you have installed an Oracle client, the tnsnames.ora file will likely be located in the ORACLE_HOME\network\admin folder and you will already have the required registry key and/or environment variable.

If you have not installed an Oracle client, you will likely need to add an environment variable on the computer running SSIS.

A Windows 10 Window labeled New System Variable has two text boxes to populate. The first is labeled Variable name and is populated with the value "TNS_ADMIN". The second is labeled Variable value and is not populated.
The New System Variable window in WIndows 10

To add the environment variable in Windows 10, Windows 11 or Windows Server 2022:

  1. Right-click the Start icon and select System.
  2. In the Settings window, select Advanced System Settings.
  3. On the Advanced tab of the System Properties window, select Environment Variables.
  4. In the Environment Variables window under System, select New.
  5. In the New System Variable window, enter “TNS_ADMIN” for the Variable name and the correct path to the folder that contains your tnsnames.ora file for the Variable value.
  6. Select OK in the New System Variable, Environment Variables, and System Properties windows.

Vague Error Message

If the Oracle connector cannot find the tnsnames.ora file, it doesn’t return a detailed error message to tell you this. Instead, you get the generic “DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER” telling you the AcquireConnection method call to the connection manager failed. Basically, it assumes that you are supplying the connection info in EzConnect format and then can’t find the database at the friendly name you provided in the connection manager. Microsoft Support helped us determine that this was the cause of our connectivity issues after we logged a support request.

I hope this helps someone avoid time spent troubleshooting. If you have more tips about the Microsoft Connector for Oracle or tnsnames.ora files, feel free to leave them in the comments.

SQL Server, SSIS

The Many Oracle Connectors for SSIS

I recently worked with a client who was upgrading and deploying several SSIS projects to a new server. The SSIS packages connected to an Oracle database in various tasks. There were:

  • Data Flows that used the Oracle Source and Oracle Destination
  • Data Flows that used an OLE DB connection to Oracle
  • Execute SQL tasks that connected to Oracle via an ADO.Net connector

We needed to make sure we had the most current version of the drivers required to make the package work.

This turned out to be an adventure involving 3 different drivers and plans to refactor, so I’m documenting some of it here in case it helps someone else.

SSIS Target Versions and Visual Studio Versions

It’s important to understand the relationship between Visual Studio and SSIS target versions. In Visual Studio, we must install an extension to allow us to develop SSIS packages. The version of Visual Studio is different but related to the SSIS target version. Visual Studio 2019 with the 2019 SSIS extension supports SSIS versions 2012 through 2022. Visual Studio 2022 with the 2022 SSIS extension supports SSIS versions 2017 through 2022.

The target version is set in the project configuration properties. This target version should match the version of SSIS running on the server where you plan to deploy the SSIS packages.

Oracle Driver Changes since SSIS 2017

In the olden days (pre-SQL 2019) we used the Attunity Connector to connect to Oracle because it provided the best performance. Attunity was acquired by Qlik in 2019, and Microsoft took over maintaining and distributing the connectors. If you need the Attunity connectors for SSIS versions 2012 – 2017, you will find the links for the downloads here.

For target SQL Server versions 2019 through 2022, we now have the Microsoft Connector for Oracle. This new connector is interesting because we are no longer required to install an Oracle client to make it work. It also allows us to connect directly to an Oracle server without having to use a TNSNAMES.ORA file if we so choose, but the TNSNames file can still be used if desired.

There are separate downloads of the Microsoft Connector for Oracle for SQL Server 2019 and 2022. And you have to download the 32-bit driver and 64-bit driver separately.

Different Drivers for Different Connection Types

Attunity wasn’t the only way to connect to Oracle using SSIS. As noted above, you could also use a .Net provider for Oracle or an OLE DB provider for Oracle. To reduce complexity and increase supportability, I would advise you that you probably want to consolidate to use a single driver to connect to Oracle as much as possible. But note that the Microsoft Connector for Oracle cannot be used in an Execute SQL task, so you might still end up with two different connectors without a lot of refactoring.

This gets a bit confusing due to naming, so I’m linking to the current version of drivers (as of May 2023) to be used with SSIS target version 2022. These are the drivers I’m aware of that I’m sure are supported by either Microsoft or Oracle. There are some other drivers out there that are either no longer maintained or provided by third parties.

Connection TypeDriverProvider Name in Visual Studio
Oracle ConnectionMicrosoft Connector for OracleOracle Connection Manager
OLE DB ConnectionODAC 21cOracle Provider for OLE DB
ADO.Net ConnectionOracle Client for Microsoft ToolsOracleClient Data Provider

You’ll need to install both the 32-bit and 64-bit drivers. The 32-bit is used in Visual Studio when developing SSIS packages. The 64-bit driver is used by default when executing packages on the SSIS server (unless you configure the package to execute in 32-bit mode).

In SSIS 2022, if you are reviewing SSIS logs and see errors related to an Oracle connection, and you see mention of ODBC and the Oracle Connection Manager, that is the Microsoft Connector for Oracle.

If you see errors in the SSIS logs related to an Oracle connection, and you see mention of OLE DB provider OraOLEDB.Oracle.1, that is the driver from ODAC.

I hope this makes clear your options and required drivers for connecting SSIS to Oracle.

Data Warehousing, Microsoft Technologies, SSIS

My Preferences for SSIS Design

Lately, I have been using SSIS execution frameworks and Biml created by other people to populate data marts and data warehouses. It has taught me a few things and helped me clarify what I like and dislike compared to my usual framework. I’ve got the beginning of my preferences list started below. There are probably situations where I would want to deviate from my preferences, but I think they make a good starting point.

Populating Data

  • For self-service BI environments, a date dimension that doesn’t go out much further than the greatest date in your data. This can be a view or stored procedure that limits and updates dates rather than a static date dimension that goes out until the end of time.
  • Unknown values are included in normal dimension loads, not in separate scripts that must be run on deployment. This way, if an unknown value is ever left out or deleted, it will be added in the next data load rather than requiring a special execution of a script.
  • Every table should have InsertDateTime and UpdateDateTime columns. The UpdateDateTime column should be populated with the same value as the InsertDateTime column upon creation of the row, rather than being left null.
  • Whatever you use to create tables, include primary keys, foreign keys, and indexes with your table definitions. Provide explicit constraint names to simplify database comparisons. You can disable your foreign keys, but they need to be there to provide that metadata.
  • Separate your final dimensional/reporting tables from audit tables and staging tables. This can be done with separate schemas or even separate databases.

Data Integration Process

  • There should be consistent error handling in each layer (staging, dims, facts, etc.). If you write errors to another location (flat file, database table), have a process that notifies the right people that errors occurred. The process of consuming corrected data must be built, tested, and integrated into the existing process.
  • Make your error handling process reflect what end users need to see when an error occurs. Does it make sense to have a partial load when there is an issue? Or should it be all or nothing?
  • Have smart master packages that determine which packages to run. Don’t check whether the package should run inside of the package itself – do that in the master package.
  • Master packages should execute child packages in parallel as much as possible rather than defaulting to sequential execution.
  • Have an audit log with one row per package. Include the SSIS ServerExecutionID in the audit log – not the package -specific ID but the execution ID for the entire run. If there are incremental loads, the where clause used to filter the load should be captured in the audit table. Include row counts as well as package start and stop time in your audit log.
  • Add an AuditLogID column on your dimension, fact, and staging tables so you can trace each row back to the process that populated it.
  • For dims and facts, perform change detection/deduplication of records, usually through hash values and either SSIS lookups or SQL queries with WHERE NOT EXISTS.
  • Avoid T-SQL MERGE statements. Write individual insert/update/delete statements. This avoid any bugs in MERGE and makes your SQL easier to understand and troubleshoot.
  • Use consistent naming of tasks, source, destinations, packages, connection managers, etc. Connection managers pointing to databases should have names that refer to the database rather than the server.
  • If you are downloading files, move the files to an archive folder once files are processed. You can have rules in place if you have retention limits. But you probably need to keep files from at least the last load for audit and troubleshooting purposes. This could change if you are importing very sensitive data.
  • Even if you need to copy all columns from a table, write a select statement for database sources that explicitly names fields rather than using SELECT *. or just selecting the table or view.
  • SSIS lookups should use an explicit query rather than referencing an entire table.
  • Implement restartability at the package level for most packages (you should have single-purpose packages executed by a master package). Checkpoints are ineffective within a package. If you build your audit log table correctly, you can get the list of packages that have not run in the last X minutes/hours and feed that to your master package.
  • Send email from your scheduling tool rather than within an SSIS package.
  • Track data lineage in your tables. This can be as simple as having a table that lists all of your data sources with an ID column and including that ID value in each row of your staging, fact, and dimension tables.
  • Dims and facts are not truncated. Data should be inserted and updated (and deleted, if necessary).
  • Connection strings used in multiple packages should be project-level connection strings.

Biml Specifics

  • Understand whether you need a flexible Biml Framework or just an accelerator for a current project. If you need flexibility, don’t hardcode connection strings and other things that change when you add/change sources and destinations. If you just need to accelerate development of a simple data mart, total flexibility may be overkill and actually cause more work.
  • Have a single place where you add synthetic metadata, as much as possible. BimlScript gets messy and difficult to understand when you have some extended properties that are read in, some annotations added directly, and some variables defined in your code. This is why I like synthetic metadata stored in a database. Also, extended properties don’t exist in Azure SQL Data Warehouse, so if you need your framework to work there you can’t go that route.
  • Don’t repeat your code in multiple files. If you have some logic that gets reused, move it to a separate file and reference it from other files.

What Do You Think?

What’s on your SSIS preferences list? Do you disagree with one of my preferences and want to share your knowledge? Let’s chat in the comments.

Data Warehousing, Microsoft Technologies, SQL Server, SSIS, T-SQL

Why We Don’t Truncate Dimensions and Facts During a Data Load

Every once in a while, I come across a data warehouse where the data load uses a full truncate and reload pattern to populate a fact or dimension. While it may not be the end of the world for a small table, it does concern me and I usually recommend to redesign the load. My thoughts below on why this is an anti-pattern are true for using the actual TRUNCATE TABLE statement as well as executing a DELETE statement with no WHERE clause.

Surrogate Keys

Dimensional models use surrogate keys rather than natural keys. Surrogate keys are system-generated, meaningless values that are usually integers used to uniquely identify a record. They provide good performance for joins in queries, allow us to switch or use multiple source systems to feed the same tables, and facilitate the use of slowly changing dimensions. If you truncate a dimension table and then repopulate it, you will end up with different surrogate keys assigned to your dimension values. Let’s say we have a Geography dimension that looks like the table below.

GeographyKeyCityStateProvinceCountryPostalCode
1DenverColoradoUnited States80205
2DallasTexasUnited States75201
3ParisÎle-de-FranceFrance75001

If I truncate and reload this table, there is no guarantee that Denver will have a GeographyKey value of 1. I might reload the data and then have Paris as 1, Denver as 2, and Dallas as 3. And since we use surrogate keys as dimension lookups in fact tables and bridge tables (and snowflaked dimensions, if you use those), I would now need to update every table that references this dimension. That’s a lot of table updates without a good reason.

Error Handling

Another reason to avoid truncating the tables in your dimensional model is error handling. When you design your data load, you need to think about what should happen when it fails and where it is most likely to fail. Failures will often occur either in data transformation steps or upon inserting/updating values in the destination table due to data type/size conflicts.

Let’s say you truncate your fact table and then you encounter an ETL failure while performing the transformations to reload the table. Now you have an empty fact table. If you have error handling logic in your ETL, you may be able to redirect the error rows to another location to be handled later. But that still leaves you with an incomplete table at best. If instead of truncating and reloading, you were doing inserts and updates when you encountered an error, your table might reflect slightly stale data as of the previous load. You still might have a situation where your table is partially updated with the rows that were inserted before the error was encountered depending on how your ETL design. But having a fact table with old or partially updated data is usually (but not always) more preferable than having no data when a data load process fails.

Performance and Data Availability

When you truncate and reload a table, you are assuming you have access to all of the source data needed to fully repopulate the table. This may be true today, but what about next year when you switch CRM systems? Or what if your organization makes the decision to archive data in the source system that is older than 5 years? If any data needed for your fact or dimension becomes unavailable, your truncate and reload pattern will fail to serve your needs.

Even if you are sure your data will be available, you may want to consider your data load times. If you only have a handful of tables with a few hundred thousand rows max, you may load your data warehouse in a few minutes. But what happens when you have a few hundred million rows with some complicated transformation logic in your pipelines? You might be adding minutes to hours to your load times.

Why Do ETL Developers End Up Truncating Fact Tables?

Sometimes developers just don’t know better. But often there were understandable reasons for using the truncate and reload pattern. While I have never built a system where I truncated a dimension, I have had a couple where we at least started out with truncating the fact table. We usually built it to make data initially available with correct values, and then worked with stakeholders to find a different way to access the data or to have them generate the data differently to alleviate the problems that caused us to want to truncate and reload.

Often the cause is data quality issues. On one project I worked, we had dimension values that defined the granularity of the fact table that would change in the source systems, and the process to try to update them in the fact table was too complicated and took too long. So we made the decision to truncate and reload, understanding the risks of doing so. Having the data available and usable for analysis gave us more information on how useful the data was to users and helped us work to understand why and how our keys were changing. Over time, we were able to influence the way data was entered into source systems so that we didn’t have to go through the truncate and reload process and could perform incremental loads on our fact table.

Sometimes source systems allow hard deletion of data and don’t maintain a list of what was deleted. And depending on access methods and the size of the data, it can be difficult to compare the rows in a fact table with the source data to identify the deleted rows. I would argue this is a poor design of the source system, but we often can’t change that, and we still need to include the data in our data warehouse. So it’s understandable why someone in this situation would want to truncate and reload a fact table.

If your stakeholders are ok with the potential of long load times, empty tables when errors occur, and the assumption that source data will always be available, then there isn’t much problem with truncating and reloading a fact table. But that is often not the case. And that decision should be made explicitly rather than implicitly.

What To Do Instead of Truncating and Reloading a Fact Table

If you’ve been truncating and reloading a fact table because that just seemed like the simplest thing to do, you can change your load pattern.

My general load pattern is:

  1. Truncate the update table
  2. Insert new rows into the fact table and changed rows into the update table
  3. Perform set-based updates on the fact table based upon the data in the update table

I’m able to determine what rows are new, changed, and unchanged by using hash values. I concatenate and hash the values that define the level of uniqueness of the row into one value I call HistoricalHash and the values from the remaining columns into a value I call ChangeHash. In other words, I load a transactional or periodic snapshot fact table in a manner similar to a Type 1 slowly changing dimension.

If you have data quality, data deletion, or other issues that prevent you from using a change detection pattern like the above, consider using a staging table and swapping it out with the fact table. Create a staging table that looks exactly like your fact table.

  1. Truncate the staging table
  2. Populate the staging table with all rows that should be in the fact table
  3. Swap the staging table and the fact table (this usually involves renaming the tables)

Understand the Consequences of Your Design

If you are choosing to truncate a dimension or fact table, be sure that you understand the trade-offs. You may be unwittingly opting for simplicity of ETL over data availability and efficient data loads.

Biml, Microsoft Technologies, SSIS

Biml for a Task Factory Dynamics CRM Source Component

I recently worked on a project where a client wanted to use Biml to create SSIS packages to stage data from Dynamics 365 CRM. My first attempt using a script component had an error, which I think is related to a bug in the Biml engine with how it currently generates script components, so I had to find a different way to accomplish my goal. (If you have run into this issue with Biml, please comment so I know it’s not just me! I have yet to get Varigence to confirm it.) This client owned the Pragmatic Works Task Factory, so we used the Dynamics CRM source to retrieve data.

I was ultimately creating a bunch of packages that looked like the below.

There are two pieces to using the Task Factory Dynamics CRM Source: the source component and the connection manager. The code for both is below.


<!– This goes in your data flow as the source –>
<CustomComponent Name="TF Dynamics 365 <#=table.Name#>" ComponentTypeName="PragmaticWorks.TaskFactory.DynamicsSource" Version="1" ContactInfo="Dynamics Source;Pragmatic Works, Inc; Task Factory (c) 2009 – 2016 Pragmatic Works, Inc; http://www.pragmaticworks.com;support@pragmaticworks.com&quot; UsesDispositions="true">
<Annotations>
<Annotation AnnotationType="Description">Extract data from Microsoft Dynamics CRM.</Annotation>
</Annotations>
<DataflowOverrides>
<OutputPath OutputPathName="Dynamics Source Output">
<!– Iterate through the columns (I'm passing in table and column from another file), ignoring any audit columns you may have added to your table. Set the Error Row Disposition and Truncation Row Disposition. –>
<# foreach (var column in table.Columns.Where(c => !c.Name.Equals("AuditETLLogID") && !c.Name.Equals("AuditInsertDate"))) { #>
<Column ErrorRowDisposition="NotUsed" TruncationRowDisposition="NotUsed" ColumnName="<#=column.Name#>" />
<# } #>
</Columns>
</OutputPath>
</DataflowOverrides>
<CustomProperties>
<CustomProperty Name="DebugMode" DataType="Boolean">false</CustomProperty>
<CustomProperty Name="FilterQuery" DataType="String"></CustomProperty>
<CustomProperty Name="BatchSize" DataType="Int32" Description="The maximum number of records to retrieve per round trip to the dynamics server">500</CustomProperty>
<CustomProperty Name="FetchMode" DataType="Int32" TypeConverter="PragmaticWorks.TaskFactory.Components.Sources.Dynamics.DynamicsSource+FetchModeEnum, PragmaticWorks.TaskFactory.Components130, Version=1.0.0.0, Culture=neutral, PublicKeyToken=47acf905d0337c39" Description="How will data be retrieved from the entity. Name or Xml Query?">0</CustomProperty>
<CustomProperty Name="Entity" DataType="String" SupportsExpression="true" Description="Name of the entity to retrieve data from"><#=table.Name#></CustomProperty>
<CustomProperty Name="QueryXML" DataType="String" SupportsExpression="true" Description="The XML Query">&lt;filter type="and"&gt;&lt;/filter&gt;</CustomProperty>
<CustomProperty Name="Get Changes" DataType="Boolean" SupportsExpression="true" Description="Get the changes for an entity">false</CustomProperty>
<CustomProperty Name="Changes Token Variable" DataType="String" SupportsExpression="true" Description="The variable that will contain the tracking changes token."></CustomProperty>
</CustomProperties>
<OutputPaths>
<OutputPath Name="Dynamics Source Output">
<OutputColumns>
<!– Add your columns here –>
<# foreach (var column in table.Columns.Where(c => !c.Name.Equals("AuditETLLogID") && !c.Name.Equals("AuditInsertDate"))) { #>
<# if (column.DataType == System.Data.DbType.Int32) { #>
<OutputColumn
Name="<#=column.Name#>"
ExternalMetadataColumnName="<#=column.Name#>"
ErrorOrTruncationOperation="Conversion"
ErrorRowDisposition="FailComponent"
TruncationRowDisposition="FailComponent" />
<# } else if (column.DataType == System.Data.DbType.String) { #>
<OutputColumn
Name="<#=column.Name#>"
Length="<#=column.Length#>"
DataType="<#=column.DataType#>"
ExternalMetadataColumnName="<#=column.Name#>"
ErrorOrTruncationOperation="Conversion"
ErrorRowDisposition="FailComponent"
TruncationRowDisposition="FailComponent" />
<# } else if (column.DataType == System.Data.DbType.Guid) { #>
<OutputColumn
Name="<#=column.Name#>"
DataType="Guid"
ExternalMetadataColumnName="<#=column.Name#>"
ErrorOrTruncationOperation="Conversion"
ErrorRowDisposition="FailComponent"
TruncationRowDisposition="FailComponent" />
<# } else if (column.DataType == System.Data.DbType.Double) { #>
<OutputColumn
Name="<#=column.Name#>"
DataType="Double"
ExternalMetadataColumnName="<#=column.Name#>"
ErrorOrTruncationOperation="Conversion"
ErrorRowDisposition="FailComponent"
TruncationRowDisposition="FailComponent" />
<# } else if (column.DataType == System.Data.DbType.Currency) { #>
<OutputColumn
Name="<#=column.Name#>"
DataType="Currency"
ExternalMetadataColumnName="<#=column.Name#>"
ErrorOrTruncationOperation="Conversion"
ErrorRowDisposition="FailComponent"
TruncationRowDisposition="FailComponent" />
<# } else if (column.DataType == System.Data.DbType.Boolean) { #>
<OutputColumn
Name="<#=column.Name#>"
DataType="Boolean"
ExternalMetadataColumnName="<#=column.Name#>"
ErrorOrTruncationOperation="Conversion"
ErrorRowDisposition="FailComponent"
TruncationRowDisposition="FailComponent" />
<# } else if (column.DataType == System.Data.DbType.DateTime) { #>
<OutputColumn
Name="<#=column.Name#>"
DataType="DateTime"
ExternalMetadataColumnName="<#=column.Name#>"
ErrorOrTruncationOperation="Conversion"
ErrorRowDisposition="FailComponent"
TruncationRowDisposition="FailComponent" />
<# } } #>
</OutputColumns>
<ExternalColumns>
<!– Add your columns here –>
<# foreach (var column in table.Columns.Where(c => !c.Name.Equals("AuditETLLogID") && !c.Name.Equals("AuditInsertDate"))) { #>
<# if (column.DataType == System.Data.DbType.Int32) { #>
<ExternalColumn
Name="<#=column.Name#>" />
<# } else if (column.DataType == System.Data.DbType.String) { #>
<ExternalColumn
Name="<#=column.Name#>"
Length="<#=column.Length#>"
DataType="String" />
<# } else if (column.DataType == System.Data.DbType.Guid) { #>
<ExternalColumn
Name="<#=column.Name#>"
DataType="Guid" />
<# } else if (column.DataType == System.Data.DbType.Double) { #>
<ExternalColumn
Name="<#=column.Name#>"
DataType="Double" />
<# } else if (column.DataType == System.Data.DbType.Currency) { #>
<ExternalColumn
Name="<#=column.Name#>"
DataType="Currency" />
<# } else if (column.DataType == System.Data.DbType.Boolean) { #>
<ExternalColumn
Name="<#=column.Name#>"
DataType="Boolean" />
<# } else if (column.DataType == System.Data.DbType.DateTime) { #>
<ExternalColumn
Name="<#=column.Name#>"
DataType="DateTime" />
<# } } #>
</ExternalColumns>
</OutputPath>
</OutputPaths>
<Connections>
<!– Do not change Name of DYNAMICSCONNECTION –>
<Connection Name="DYNAMICSCONNECTION" ConnectionName="TF_DynamicsCRM" />
</Connections>
</CustomComponent>
<!– This goes in your environments file or elsewhere, just including to keep this with the source component –>
<CustomSsisConnection Name="TF_DynamicsCRM" CreateInProject="true" CreationName="DynamicsCrmConnectionManager" ObjectData="&lt;DynamicsConnectionManager
ConnectionString=&quot;UseProxy=False;UseConnectionSharing=False;ServerType=3;CRMVersion=4;OrganizationName=MyOrgName;ProxyHost=;ProxyPort=0;ProxyUser=;ReuseSameConnection=False;ServerHost=disco.crm.dynamics.com;ServerTimeout=60;ServerUser=Myuser@domain.com;&quot;&gt;
&lt;ServerPassword
Sensitive=&quot;1&quot; xmlns=&quot;www.microsoft.com/SqlServer/Dts&quot;
p4:Salt=&quot;mOECnMOaUg==&quot;
p4:IV=&quot;Y91tUjUbZZk=&quot;
xmlns:p4=&quot;www.microsoft.com/SqlServer/SSIS&quot;&gt;DxApS7rkG1qIrWFNXW7lVxhP1NCf5ERQLLuBUoIgh+0Qq2h3j8EltnVddZLUKDcYTisgAp5dUICR827d5VaHqsn2Q2SbWB2Q2XVL5pzd38/E4j+vds1beozDzD10OLdXxql11vCvEE0=&lt;/ServerPassword&gt;
&lt;/DynamicsConnectionManager&gt;"/>

Things to note for the source component:

  •  This is pulling data for an entity in Dynamics 365. If you were to populate this by hand, the Task Factory UI would ask you which entity. There are several entities that are visible, but do not allow you to retrieve the data. You can find a list of internal entities here.  You won’t want to stage data from those tables. I generated all of my packages at once with Biml, using a Blacklist to exclude the internal entities.
  • The entity name that should be the value on line 21 is the entity logical name.
  • I’m iterating through columns in tables in a separate file and holding them in memory, hence the references to Table.columns. You’ll need to write some code to get the entity metadata to feed that, or find another way to provide the table names and column names and data types.
  • You must set the ErrorRowDisposition and TruncateRowDisposition to “NotUsed” as I did on line 10 for each column or your package will not work.
  • In OutputColumns and ExternalColumns collections, I just have if statements to check data types. There may be a more elegant way to do this, but this worked for me and I went with it. Feel free to leave suggestions in the comments if you have a better idea.
  • The Connection element on line 145 should keep the name “DYNAMICSCONNECTION”. If you change it, the connection won’t work. In my actual project, my connection manager is in a separate file and is named “TF_DynamicsCRM”. You can set the ConnectionName property to whatever you want, as long as it matches your connection definition (Line 153 in the gist).

Things to note for the Connection Manager:

  • I tried reverse engineering the connection manager in Biml Studio, and that got me close, but it didn’t get the ObjectData property quite right. I ended up having to create one manually, view the code for it in SSDT, and copy it into my Biml file. I have my packages and project set to EncryptSensitiveWithPassword, which is why you see the p4:Salt value on line 158. Your connection manager will have different values for lines 158, 159, and 160. If you set your project to EncryptSensitiveWithPassword, this value will stay consistent across developers. But if you use EncryptSensitiveWithUserKey, the value may change, which will be fun for regenerating with Biml. So make sure you plan for that if you have multiple developers or multiple computers on which you are working.
  • This connection manager is set to connect to Dynamics 365, hence the ServerHost=disco.crm.dynamics.com. If you have an on-prem or hosted environment that isn’t 365, you’ll have to find the correct ServerHost value and put it in line 165.

I hope that helps someone looking to generate the Task Factory Dynamics Source with Biml.

Excel, Microsoft Technologies, Power BI, SSIS

Installing the Microsoft.ACE.OLEDB.12.0 Provider for Both 64-bit and 32-bit Processing

I recently got a new laptop and had to go through the ritual of reinstalling all my programs and drivers. I sometimes work with SSIS locally to import data from Excel and occasionally do demos with Power BI where I read from an Access database so I needed to install the ACE OLE DB provider.  If you try to do either of those things without the ACE OLE DB provider you will get a message like the one below.

The ‘Microsoft.ACE.OLEDB.12.0’ provider is not registered on the local machine.

I needed both the 32-bit and the 64-bit adapters to cover my various use cases. I started by installing the 64-bit Microsoft Access Database Engine 2010 Redistributable.  If you have ever attempted this, you know there is a secret. You cannot install the 2010 Access database engine with a different processing architecture (bittedness) from your Office install (or install both versions of the 2010 Access database engine) without a workaround.  If you try, you will get an error message.

ACE engine bittedness error

There are two options for getting the other version installed.

Option A: Install the 2007 Office System Driver: Data Connectivity Components

The 2007 Access database engine can read the same format, but didn’t have a 64-bit install. It doesn’t have the check to see what other products are on your machine and can co-exist peacefully with the 64-bit 2010 Access engine.

Option B: Install from the Command Line and Use the Passive Switch

  1. Open the command prompt.
  2. Type the file path and file name for your Access Database Engine install file.
  3. Follow it with a space and then “/passive” and then hit Enter.
  4. Delete or rename the mso.dll registry value in the following registry key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Common\FilesPaths

Note: A helpful commenter has informed me that for Access 2016 the switch is /quiet rather than /passive.

ACE install cmd

I wrote up a document for this a while back for a client, but I can never remember where I put it. So I’m posting this on my blog to make this info easier to find for me as well as you.

Azure Data Factory, Biml, Conferences, SSIS

I’m Speaking at IT/Dev Connections 2017

I’m pleased to say that I am speaking at IT/Dev Connections 2017. This year the conference will be held in San Francisco October 23-26. I had a great experience speaking at IT/Dev Connections in 2015, so I am excited to return again this year.

This conference is special to me because of its focus on providing great content for developers and IT pros – the conference website describes it as the “anti-keynote” conference with no forced marketing content.

I also enjoy it because it is more than just SQL Server/Data Platform (they have tracks for Cloud & Data Center, Enterprise Collaboration, Development & Dev Ops, and Enterprise Mobility and Security), and it’s nice to get out of my comfort zone a bit.

I will deliver two sessions at the conference.

Azure Data Factory in A Nutshell

If you have been wanting to get into Azure Data Factory (ADF) development, join me for this demo-filled overview. In this session, we’ll go over the basic anatomy of an ADF solution. You’ll learn what ADF is and isn’t as we walk through a solution to pull data from an on-premises SQL Server database to a blob storage and then populate and Azure SQL Data Warehouse. You’ll learn tips for creating ADF solutions in Visual Studio, and I’ll show you how to make ADF development less tedious with a free Visual Studio Add-in called Biml Express. You’ll leave with a basic understanding of ADF and a list of tools and skills you’ll want to acquire as you begin your ADF development.

Improve Data Warehouse ETL Delivery with a Patterns-Based Approach

What if I told you that 90% of your data integration development in SQL Server could be automated? In 5 years, you will be “old fashioned” if you are hand coding SSIS packages. Developers with different skill levels and design preferences create databases and SSIS packages however they see fit to get the job done. Documentation is frequently omitted. Maintenance and small enhancements consume too much development time while manual errors and inconsistencies slip through the testing and release process. You can use tools and frameworks to rearrange the development process and alleviate these common problems. The implementation and automation of design patterns leads to improved efficiency and communication. Join me in this session to learn how to use Business Intelligence Markup Language (Biml) and Excel to facilitate metadata-driven SSIS development. I’ll use database schema information plus Excel inputs to implement a small data mart from staging through the dimensional model.

I hope you will join me in San Francisco in October!

Microsoft Technologies, SSAS, SSIS

Process Compatibility Level 1200 SSAS Tabular Model from SSIS 2014

A client wanted to upgrade their SSAS model to SSAS 2016 to take advantage of some of the features of the new level 1200 compatibility model. But they weren’t yet ready to upgrade their SSIS server from SQL 2014. This presented a problem because they had been using the Analysis Services Processing Task to process their tabular model nightly. This processing task in SSIS 2014 uses the old Analysis Management Objects, which aren’t compatible with the new SSAS tabular models.

Attempting to use the AS Processing Task results in the following error: “[Analysis Services Execute DDL Task] Error: This command cannot be executed on database ‘MySSASDB’ because it has been defined with StorageEngineUsed set to TabularMetadata. For databases in this mode, you must use Tabular APIs to administer the database”

The reason for keeping SSAS processing in an SSIS package was because it kept consistent logging throughout their data refresh process. So we set out to find another solution.

The new SSAS Tabular models use Tabular Model Scripting Language (JSON) rather than XMLA. A simple process full command in TMSL might look like:

{
  "refresh": {
    "type": "full",
    "objects": [
      {
        "database": "MySSASDB"
      }
    ]
  }
}

Just pasting the JSON to process the model into an Analysis Services Execute DDL Task didn’t work. It returned the error “DDL is not valid”.

I asked around and received a couple of solutions that worked.

Option 1 (my less preferred option): Create an OLEDB connection manager (rather than MSOLAP) and use an Execute SQL Task.

You can set up an OLE DB connection that looks like the below.  Create a new connection manager and choose the OLE DB connection manager type. Change the provider to Microsoft OLE DB Provider for Analysis Service 13.0 and fill in your connection information.

ms-oledb-provider-for-as-conmgr

You’ll see the connection manager show up in the Connection Managers pane looking like this:

ssas-oledb-prov

Drag in an Execute SQL task, use the previously defined connection manager and paste in the JSON. This works just fine, but feels a bit too much like a workaround rather than a solid solution to me.

Option #2: Use an Analysis Services Execute DDL task and wrap the JSON in XMLA

Drag in an Analysis Services Execute DDL task. Create a new connection manager by choosing New Analysis Services Connection.

ssis-new-ssas-con

Edit your connection information, click OK, and you will end up with a connection manager that looks like this:

ssis-ssas-con-mgr

Use that connection manager in the AS Execute DDL task. We can use the same JSON from earlier and wrap it in XMLA as shown below.

<Statement xmlns="urn:schemas-microsoft-com:xml-analysis">
{
  "refresh": {
    "type": "full",
    "objects": [
      {
        "database": "MySSASDB"
      }
    ]
  }
}

</Statement>

The XMLA/JSON command can be a direct source statement or placed in a variable and referenced from the task.

To test that the model is successfully processed, you can execute the SSIS task or package and then run the following query against the DMV.

Select [catalog_name], [date_modified], [compatibility_level] 
from $SYSTEM.DBSCHEMA_CATALOGS where [Catalog_Name] = 'MySSASDB'

So if you are caught between versions in SSIS and SSAS, do not despair. You can still process your new SSAS Tabular model from an SSIS package.

Biml, Microsoft Technologies, SSIS

BimlScript – Get to Know Your Control Nuggets

This is post #2 of my BimlScript – Get to Know Your Code Nuggets series. To learn about text nuggets, see my first post.  

The next type of BimlScript code nugget I’d like to discuss is the control nugget. Control nuggets allow you to insert control logic to determine what Biml is generated and used to create your BI artifacts (packages, cubes, etc.).  Control nuggets can be used to:

  • Add conditional logic so that specified Biml fragments are only generated when certain criteria are met
  • Add loops to repeat Biml fragments multiple times
  • Define variables that will be used later
  • Access external data from databases, flat files or other sources that will be used in the generation of the final Biml code

Control nuggets start with <# and end with #>. Just like text nuggets, control nuggets can be a single line or multiple lines. And they can contain simple or complex logic.

There were actually control nuggets in the text nugget example from the previous post. The variable declarations at the top of the file (lines 4 – 10) are control nuggets.


<#* The variables below are control nuggets.
They will be discussed later but are needed for ths example.*#>
<#
var PackageName = "Stage_AW_SalesReason";
var SourceTable = "SalesReason";
var SourceSchema = "Sales";
var DestinationTable = "SalesReason";
var DestinationSchema = "Staging";
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd"&gt;
<Connections>
<OleDbConnection Name="AW2014"
ConnectionString = "Data Source=.\SQL2014;Initial Catalog=AdventureWorks2014;Integrated Security=SSPI;Provider=SQLNCLI11.1;"/>
<OleDbConnection Name="MyDataMart"
ConnectionString = "Data Source=.\SQL2014;Initial Catalog=AWBIML;Integrated Security=SSPI;Provider=SQLNCLI11.1;"/>
</Connections>
<Packages>
<Package Name="<#=PackageName#>" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive">
<Tasks>
<ExecuteSQL Name="SQL Truncate_<#=DestinationTable#>" ConnectionName="AW2014">
<DirectInput>Truncate Table [<#=DestinationSchema.ToUpper()#>].[<#=DestinationTable.ToUpper()#>];</DirectInput>
</ExecuteSQL>
<Dataflow Name="DFT Stage_<#=DestinationTable#>">
<Transformations>
<OleDbSource Name="OLE_SRC <#=SourceTable#>" ConnectionName="AW2014" >
<DirectInput>SELECT * FROM [<#=SourceSchema.ToUpper()#>].[<#=SourceTable.ToUpper()#>];
</DirectInput>
</OleDbSource>
<OleDbDestination Name="OLE_SRC <#=DestinationSchema#><#=DestinationTable#>" ConnectionName="MyDataMart">
<ExternalTableOutput Table="<#=DestinationSchema#>.<#=DestinationTable#>"></ExternalTableOutput>
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>

Below is another example file that uses code nuggets. I like to separate my design patterns. project connections, and package generation into separate files. The BimlScript shown below is from a slightly altered “caller file”, the file that I would execute to create the SSIS packages that should follow a specified design pattern. This caller file is generating Type 1 slowly changing dimensions.


<#*This file retrieves variable values from a database
and passes them to another Biml file file that contains
the design pattern for a type 1 SCD*#>
<#*The items directly below this comment are directives*#>
<#@ template language="C#" hostspecific="true" #>
<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<#@ import namespace="System.IO" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd"&gt;
<!–
<#
string AuditConnectionString = "Data Source=.\\SQL2014;Initial Catalog=AWBIML;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;";
string SrcTableQuery = @"
SELECT
[PackageName]
,[DestinationSchemaName]
,[DestinationTableName]
,[DestinationConnection]
,[DataFlowSourceName]
,[SourceConnection]
,[DataFlowQuery]
,[UpdateSchemaName]
,[UpdateTableName]
,[UpdateConnection]
,[UpdateSQLStatement]
,[AuditConnection]
,[AuditSchema]
FROM [BIML].[DimensionPackageVariables]
where PackageType ='Dim1'
;
";
DataTable dt = null;
dt = ExternalDataAccess.GetDataTable(AuditConnectionString, SrcTableQuery);
#>
–>
<#@ include file="ProjectConnections.biml" #>
<Packages>
<# foreach (DataRow dr in dt.Rows) { #>
<#=CallBimlScript("Dim1.biml", dr[0].ToString(), dr[1].ToString(), dr[2].ToString(), dr[3].ToString(), dr[4].ToString(), dr[5].ToString(), dr[6].ToString(), dr[7].ToString(), dr[8].ToString(), dr[9].ToString(), dr[10].ToString(), dr[11].ToString(), dr[12].ToString())#>
<# } #>
</Packages>
</Biml>

view raw

Dim1Caller.biml

hosted with ❤ by GitHub

In my Biml framework, I store the data I need in order to generate my packages in SQL tables. You can see the control nugget in lines 14 – 40 retrieving data from my database and storing it in a data table for later use.

On line 46, there is a control nugget containing a for each loop. For each row in the data table, it calls a BimlScript file that creates a package that uses my Type 1 SCD design pattern with the variables from data table. Notice that the end curly brace for my loop is in a separate control nugget.

There are many different ways to use control nuggets that aren’t covered here.  From these two examples you can start to see how I might combine text nuggets and control nuggets to automate my SSIS package creation and employ consistent design patterns. But there are still a few missing pieces that will be filled in when I cover the remaining code nugget types.

Biml, Microsoft Technologies, SSIS

BimlScript – Get to Know Your Code Nuggets

gold nuggetsIn BimlScript, we embed nuggets of C# or VB code into our Biml (XML) in order to replace variables and automate the creation of our BI artifacts (databases, tables, SSIS packages, SSAS cubes, etc.). Code nuggets are a major ingredient in the magic sauce that is meta-data driven SSIS development using BimlScript.

There are 5 different types of code nuggets in BimlScript:

  • Text nuggets
  • Control nuggets
  • Class nuggets
  • Comments
  • Directives

Over the next several posts I’ll cover each type of code nugget and provide examples of their use.

Text Nuggets

Text nuggets evaluate the expression they contain and then replace the text nugget with the string representation of the value of the expression. I use them often to switch out names of packages, tasks, and components as well as source and destination tables in SSIS development when creating packages based upon a design pattern.

Text nuggets start with <#= and end with #>. Notice there is an equals sign at the beginning of the text nugget but not at the end.

Text nuggets are very useful. You can include complex business logic in the expressions. And the expression result can be any data type. The BimlScript compiler will automatically convert it to text before replacing the code nugget with the result. Like all code nuggets, text nuggets can be a single line or multiple lines.

Let’s take a look at some examples.

Note: The Biml script below has variables, connections, and a package all in one file for ease of demonstration. Normally I would split these out into separate files. 


<#* The variables below are control nuggets.
They will be discussed later but are needed for ths example.*#>
<#
var PackageName = "Stage_AW_SalesReason";
var SourceTable = "SalesReason";
var SourceSchema = "Sales";
var DestinationTable = "SalesReason";
var DestinationSchema = "Staging";
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd"&gt;
<Connections>
<OleDbConnection Name="AW2014"
ConnectionString = "Data Source=.\SQL2014;Initial Catalog=AdventureWorks2014;Integrated Security=SSPI;Provider=SQLNCLI11.1;"/>
<OleDbConnection Name="MyDataMart"
ConnectionString = "Data Source=.\SQL2014;Initial Catalog=AWBIML;Integrated Security=SSPI;Provider=SQLNCLI11.1;"/>
</Connections>
<Packages>
<Package Name="<#=PackageName#>" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive">
<Tasks>
<ExecuteSQL Name="SQL Truncate_<#=DestinationTable#>" ConnectionName="AW2014">
<DirectInput>Truncate Table [<#=DestinationSchema.ToUpper()#>].[<#=DestinationTable.ToUpper()#>];</DirectInput>
</ExecuteSQL>
<Dataflow Name="DFT Stage_<#=DestinationTable#>">
<Transformations>
<OleDbSource Name="OLE_SRC <#=SourceTable#>" ConnectionName="AW2014" >
<DirectInput>SELECT * FROM [<#=SourceSchema.ToUpper()#>].[<#=SourceTable.ToUpper()#>];
</DirectInput>
</OleDbSource>
<OleDbDestination Name="OLE_SRC <#=DestinationSchema#><#=DestinationTable#>" ConnectionName="MyDataMart">
<ExternalTableOutput Table="<#=DestinationSchema#>.<#=DestinationTable#>"></ExternalTableOutput>
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>

This BimlScript creates an SSIS package that truncates and reloads data in a single table.

Notice that you can use the text nuggets alone, together, or with literal text.

The <#=PackageName#> text nugget simply replaces the value for the name of the package with the current value of the PackageName variable I set at the top of the file.

The DataFlow Task name is a combination of text and a code nugget.

<Dataflow Name="DFT Stage_<#=DestinationTable#>">

This comes in handy if you would like to enforce certain naming standards. My personal preference is to use naming conventions suggested by Jamie Thomson, so I build them into my Biml and then use text nuggets as shown above to change the rest of the name to reflect the nature of the current task.

The truncate table statement in the Execute SQL Task contains text and two code nuggets that reference a variable and use a function to convert the variable value into all uppercase text:

Truncate Table [<#=DestinationSchema.ToUpper()#>].[<#=DestinationTable.ToUpper()#>];

Hopefully you can see how text nuggets are useful when creating multiple packages at the same time. They allow you to switch out the values that change for each package and create consistent naming standards.

To get to know other types of code nuggets, see Get To Know Your Control Nuggets.