Azure Data Factory, Biml, Conferences, SSIS

I’m Speaking at IT/Dev Connections 2017

I’m pleased to say that I am speaking at IT/Dev Connections 2017. This year the conference will be held in San Francisco October 23-26. I had a great experience speaking at IT/Dev Connections in 2015, so I am excited to return again this year.

This conference is special to me because of its focus on providing great content for developers and IT pros – the conference website describes it as the “anti-keynote” conference with no forced marketing content.

I also enjoy it because it is more than just SQL Server/Data Platform (they have tracks for Cloud & Data Center, Enterprise Collaboration, Development & Dev Ops, and Enterprise Mobility and Security), and it’s nice to get out of my comfort zone a bit.

I will deliver two sessions at the conference.

Azure Data Factory in A Nutshell

If you have been wanting to get into Azure Data Factory (ADF) development, join me for this demo-filled overview. In this session, we’ll go over the basic anatomy of an ADF solution. You’ll learn what ADF is and isn’t as we walk through a solution to pull data from an on-premises SQL Server database to a blob storage and then populate and Azure SQL Data Warehouse. You’ll learn tips for creating ADF solutions in Visual Studio, and I’ll show you how to make ADF development less tedious with a free Visual Studio Add-in called Biml Express. You’ll leave with a basic understanding of ADF and a list of tools and skills you’ll want to acquire as you begin your ADF development.

Improve Data Warehouse ETL Delivery with a Patterns-Based Approach

What if I told you that 90% of your data integration development in SQL Server could be automated? In 5 years, you will be “old fashioned” if you are hand coding SSIS packages. Developers with different skill levels and design preferences create databases and SSIS packages however they see fit to get the job done. Documentation is frequently omitted. Maintenance and small enhancements consume too much development time while manual errors and inconsistencies slip through the testing and release process. You can use tools and frameworks to rearrange the development process and alleviate these common problems. The implementation and automation of design patterns leads to improved efficiency and communication. Join me in this session to learn how to use Business Intelligence Markup Language (Biml) and Excel to facilitate metadata-driven SSIS development. I’ll use database schema information plus Excel inputs to implement a small data mart from staging through the dimensional model.

I hope you will join me in San Francisco in October!

Azure, Azure Data Factory, Microsoft Technologies

Azure Data Factory and the Case of the Missing JRE That Wasn’t

Note: This post was written about Azure Data Factory V1, but is also applicable to V2.
On a recent project I used Azure Data Factory (ADF) to retrieve data from an on premises SQL Server 2014 instance and land them in Azure Data Lake Store (ADLS) as ORC files. This required the use of the Data Management Gateway (DMG). Setup was quick and easy in our development environment. We installed the DMG for development on a separate server in the client’s network, where we also installed SQL Server Management Studio (SSMS) for query development and data validation. We set up resource groups in Azure for development and production, and made sure the settings for development and production were the same.  Then we set up a separate server for the production DMG.

Deployment and execution went well in the dev environment. Testing was completed, so we deployed to prod. Deployment went fine, but the pipelines failed execution and returned the following error on the output data sets.

ADF Error JRE
Java Runtime Environment is not found.

The Java Runtime Environment (JRE) is not required for the DMG to run successfully, but it is needed for the creation of ORC files. The only problem with this error message was that we did indeed have the JRE installed on the server.

After reinstalling the JRE and the DMG and getting the same error, I consulted the troubleshooting guide. After finding nothing relevant there, I asked some colleagues for suggestions.

  • I double-checked that I had the same version of the DMG that worked in dev and that I had the most current version of the JRE.
  • I double-checked that the DMG and JRE matched bit-wise (32-bit vs. 64-bit). Both were 64-bit in my case.
  • I checked that JAVA_HOME was set correctly in the environment variables.

When none of those things worked, I logged a support ticket with Microsoft. They had me do the following:

  • Check the registry key – HKEY_LOCAL_MACHINE\Software\JavaSoft\Java Runtime Environment should have a Current Version entry that shows the current JRE version.
  • Check that the subkey in the folder labeled with the version has a JAVAHOME entry with the correct path (something like C:\Program Files\Java\jre1.8.0_74).
  • Open the path and check that the bin folder exists.
  • Check that jvm.dll exists in the bin/server folder.

When none of those things worked, they gave me one last suggestion:

Install the Microsoft Visual C++ 2010 Redistributable Package.

And that turned out to solve the problem!

After review, we realized that we had installed SSMS on the dev DMG server but not on the prod DMG server. SSMS would have required the installation of the C++ redistributable package, which is why we didn’t encounter this error in dev.

I will confess that I don’t understand exactly why missing C++ libraries manifest themselves in an error claiming a missing Java Runtime Environment. If you have a good explanation, please leave it in the comments and I’ll update this and give you credit.

I hope that someone else who runs into this issue will find this blog post and avoid days of troubleshooting and confusion.