BYO Time Zone Conversion With SSAS DMVs

This is just a quick note (since I apparently forgot and was puzzled for a moment) that the MDSCHEMA_CUBES DMV shows you LAST_DATA_UPDATE (last processed date) in UTC, regardless of the timezone of your SSAS Server. Marco Russo has a great post about all the ways you can get your SSAS model’s last processed date, which does note that the DMV returns the datetime in UTC.

You can get your SSAS models’ last processed time by running this query (replacing MySSASDBName with the name of your SSAS database):

SELECT TOP 1 [CATALOG_NAME], [LAST_DATA_UPDATE]
FROM $System.MDSCHEMA_CUBES
WHERE [CATALOG_NAME] = 'MySSASDBName'

The datetime returned by this query is in UTC. My query returns 9/19/2016 7:43:03 PM.

If I go into the properties of my SSAS database, I can see this same info, but the timezone conversion has already been done for me (this server is in Central time zone).

ssas-properties

So if you have ever gotten results that seem to be in the future or wondered why these two times do not match, it’s probably due to time zone conversion.

DAX Date Dimension and Fun with Date Math

I was working on a SSAS Tabular 2016 solution for a project for which I had no data (an empty data model, but no data). I was not in control of the source data warehouse, so I couldn’t change what I had, but I needed to get started. So I went about creating a date dimension with a fiscal calendar entirely in DAX. There are several scripts out there to create a date dimension. Although there are many nice alternatives for Power BI that are built in M or a combination of M and DAX, I haven’t seen many full date dimensions with a fiscal calendar just using DAX.

I’ll explain some of the DAX calculations below, but I put the TMSL/JSON in Gist so you can easily paste into a SSAS 2016 Tabular project. The JSON has the added benefit of retaining data types, formatting, hidden, and sort by properties.

The fiscal calendar assumes calendar months with the fiscal calendar shifted by some number of months. It also assumes that the fiscal year is ahead of the calendar year. That is, fiscal year 2017 starts in the first day of some month in 2016. In my example script, the fiscal year starts in October. If you have some 4-4-5, 4-5-4, or other calendar, the fiscal calendar calculations in this script won’t help you.

To add this date dimension to your SSAS Tabular project:

  1. Right click on the Model.bim file in the Solution Explorer and choose View Code.
  2. If you have at least one table in the model already, locate the end of the definition of the previous table, inside the table array. Add a comma after the end curly brace for the previous table.
  3. Paste the JSON/TMSL from my Gist.
  4. Save and close the file.
  5. Right click on Model.Bim and choose View Designer.

TMSL

To adjust the date range included in the Date table, locate the [Date] field. The current calculation is:

=CALENDAR(DATE(2010,1,1),DATE(2025,12,31))

The current date range goes from 1 Jan 2010 through 31 Dec 2025.

To adjust the start of the fiscal year, change the expression in the [Fiscal Year Month Begin] field. The current calculation is

=10

This field represents the calendar month number in which the fiscal year begins. If your fiscal calendar begins in July, you’ll want to change the 10 to 7.

You shouldn’t have to adjust anything else in the date dimension to get correct values.

Calculation Explanations

I used the FORMAT() function to get a lot of the calendar fields as well as the [DateKey] field.

DateKey = VALUE(FORMAT([Date],"YYYYMMDD"))

Month Year Abbrev = FORMAT([Date],"mmm YYYY")

Month Name = FORMAT([Date],"mmmm")

Day of Week = FORMAT([Date],"dddd")

Qtr Nbr = VALUE(FORMAT([Date],"Q"))

Many of the fiscal calendar fields use the EDATE() to shift date calculations to the correct month. The [Fiscal Year Nbr] field uses the [Fiscal Year Month Begin] field to ensure the correct year is shown.

Month Shift = 13 - [Fiscal Year Month Begin]

Fiscal Month Nbr = MONTH(EDATE([Date],[Month Shift]))

Fiscal Quarter Nbr = ROUNDUP(MONTH(EDATE([Date],[Month Shift]))/3,0)

Fiscal Year Nbr = IF([Month Nbr] >= VALUE(FORMAT(Date([Year],
[Fiscal Year Month Begin],1),"M")), [Year]+1, [Year])

I also like to include the beginning and end dates for month, quarter, and year.

Begin of Month = DATE([Year],[Month Nbr],1)

End of Month = EOMONTH([Date],0)

Begin of Quarter = IF([Month Nbr]<4,DATE([Year],1,1),
IF([Month Nbr]<7,DATE([Year],4,1),
IF([Month Nbr]<10,DATE([Year],7,1),
DATE([Year],10,1))))

End of Quarter = IF([Month Nbr]<4,DATE([Year],3,31),
IF([Month Nbr]<7,DATE([Year],6,30),
IF([Month Nbr]<10,DATE([Year],9,30),
DATE([Year],12,31))))

Begin of Year = DATE([Year],1,1)

End of Year = DATE([Year],12,31)

DAX has a nice EOMONTH() function to get the end of the month now. There are multiple ways to get end of quarter. I used an if statement. You could also use SWITCH(). I also included begin and end dates for the fiscal quarter and year (end of month is the same since my calendar assumes fiscal months are calendar months).

Begin of Fiscal Year = DATE([Fiscal Year Nbr]-1,[Fiscal Year Month Begin],1)

End of Fiscal Year = EOMONTH(
Date([Fiscal Year Nbr],[Fiscal Year Month Begin],1),-1)

Begin of Fiscal Quarter = IF([Fiscal Quarter Nbr]=1,[Begin of Fiscal Year],
IF([Fiscal Quarter Nbr] = 2, EDATE([Begin of Fiscal Year],3),
IF([Fiscal Quarter Nbr] = 3, EDATE([Begin of Fiscal Year],6),
EDATE([Begin of Fiscal Year],9))))

End of Fiscal Quarter = IF([Fiscal Quarter Nbr]=1,
EDATE([Begin of Fiscal Year],3)-1,
IF([Fiscal Quarter Nbr] = 2, EDATE([Begin of Fiscal Year],6)-1,
IF([Fiscal Quarter Nbr] = 3, EDATE([Begin of Fiscal Year],9)-1,
EDATE([Begin of Fiscal Year],12)-1)))

The EOMONTH() function allows you to shift forward or backward from the current selected month. For the end of the fiscal year, I went to the beginning of the next fiscal year, subtracted a month, and then found the end of that month. The end of fiscal quarter calculation follows similar logic get the month that begins the next quarter and subtract one day.

 

Create a Date Dimension in Azure SQL Data Warehouse

Most data warehouses and data marts require a date dimension or calendar table. Those of us that have been building data warehouses in SQL Server for a while have collected our favorite scripts to build out a date dimension. For a standard date dimension, I am a fan of Aaron  Bertrand’s script posted on MSSQLTips.com. But the current version (as of Aug 8, 2016) of Azure SQL Data Warehouse doesn’t support computed columns, which are used in Aaron’s script.

I adapted Aaron’s script to work in Azure SQL Data Warehouse and am sharing it with you below, so you don’t have to do the same. I did leave out holidays because I didn’t need them in my calendar table.  To add them back in, just add the column back to the table and use Aaron’s update statements. I also changed the MonthYear field to include a space between the month and year. Otherwise, my script should produce the same results as Aaron’s in a Azure SQL DW friendly way. Notice that I chose to use Round Robin distribution for my date table. Also, tables are now created with clustered columnstore indexes by default in Azure SQL DW.

Instead of the computed columns, I inserted the dates, then updated the other columns in the temporary table. Then I used the Create Table As Select syntax to create my final RPT.Calendar table.

Happy time trending!

PolyBase Is A Picky Eater – Remove Carriage Returns Before Ingesting Text

I’ve spent the last few weeks working on a project that used PolyBase to load data from Azure Blob Storage into Azure SQL Data Warehouse. While it’s been a great experience, I must note that PolyBase is a picky eater.

4184268 - finicky or picky bulldog pouting beside full bowl of dog food
PolyBase is a picky eater, just like this bulldog.

Creating the external tables is fairly simple if you are already familiar with T-SQL. The syntax for creating external data sources, file formats, and tables is fairly well documented on MSDN. It’s important to be aware of features and data types that aren’t supported, but I found it to be pretty smooth sailing beyond that.

But just because you have successfully created the external tables does not mean you are finished. That is when the “fun” begins. If you would like more information on why “fun” is in quotes, read Grant Fritchey’s blog post on Loading Data into Azure SQL Data Warehouse.

You should test after populating any table in SQL Server, but I think this is especially true with external tables. More than likely you will find that you must resolve several issues with source file contents and external table definitions.

First let me say that PolyBase is cool. I can query data in text files and join to tables in my database. Next let me say PolyBase is a fairly young technology and has some limitations that I imagine will be improved in later versions.

One of those limitations (as of July 30, 2016) is that while you can declare your field delimiter and a string delimiter in external file formats, the row delimiter is not user configurable and there is no way to escape or ignore the row delimiter characters (\r, \n, or \r\n) inside of a string. So if you have a string that contains the row delimiter, PolyBase will interpret it as the end of the row even if it is placed inside of the string delimiters.

To elaborate further, I had several fields that originally came from a SQL Server table and were of type text. Some of the values in these fields contained newlines (\n) as users had typed paragraphs and addresses into the fields in the source application. The data from the source tables was exported from SQL Server to Azure Blob Storage using Azure Data Factory with a simple copy pipeline with no modifications to the data. The problem is that Hive, PolyBase, and several other tools have issues reading strings with newlines the value. They immediately interpret it as the end of the row. There is no escape character or setting you can use to allow newlines in the values.

If you find yourself in a similar situation, trying to load data from delimited files into Azure SQL DW and realizing you have newlines inside of string fields, there are two things you can do.

  1. Fix the data in the flat files so it doesn’t contain new lines in string fields.
  2. Switch to a different tool to load data to Azure DW. Azure Data Factory can take the data from blob storage and import it into a normal table in Azure DW.

In most circumstances, I would go for option #1. Option #2 only fixes things in Azure DW, leaving other tools in the environment to deal with the issue separately, and it requires storing a copy of the data in the DW.

In my project, I changed the ADF pipelines to replace newlines with an acceptable character/set of characters that doesn’t often appear in my data set and doesn’t obscure the values. We chose to replace them with 4 spaces. It’s important to understand that this means that your data in your blob storage will no longer exactly match its source. This is something you will want to document because it will surely pop up somewhere in the future.

Updating the ADF pipelines is not much effort. If my table definition is

CREATE TABLE TableA (
    Column1 int,
    Column2 varchar(10),
    Column3 text
)

and my original pipeline sqlReaderQuery was SELECT * FROM TableA, I just need to enumerate my fields, convert the text fields to varchar, and replace the new lines.

SELECT
    [Column1],
    [Column2],
    [Column3] = REPLACE(
                  REPLACE(
                    CAST([Column3] AS varchar(8000)), 
                      CHAR(13), '    '), 
                      CHAR(10), '    ')
FROM TableA

If you are using time slices in ADF and you have your query inside of the Text.Format() function, you will find that ADF doesn’t allow the single quotes around the four spaces in your JSON. You can instead use CHAR(32) instead of a space. If you have a better way of accomplishing this, please leave me a note in the comments.

SELECT
    [Column1],
    [Column2],
    [Column3] = REPLACE(
                  REPLACE(
                    CAST([Column3] AS varchar(8000)), 
                      CHAR(13), CHAR(32) + CHAR(32) + CHAR(32) + CHAR(32)), 
                      CHAR(10), CHAR(32) + CHAR(32) + CHAR(32) + CHAR(32))
FROM TableA

In addition to updating the ADF pipelines, I also had to replace the newlines in my existing files in blob storage. Since there weren’t many of them, I just opened them up in Notepad++ and did a find & replace. If there had been more files, I would have looked into a more automated solution.

If the ability to allow field/row terminators within string fields is something you would like to see in the PolyBase, please voice your opinion by casting a vote on the feedback site.

Using Context To Traverse Hierarchies In DAX

My friend and coworker Melissa Coates (aka @sqlchick) messaged me the other day to see if I could help with a DAX formula. She had a Power BI dashboard in which she needed a very particular interaction to occur. She had slicers for geographic attributes such as Region and Territory, in addition to a chart that showed the percent of the regional total that each product type represented.  The product type was in the fact/data table. Region and territory were in a dimension/lookup table and formed a hierarchy where a region was made up of one or more territories and each territory had only one region.Territory Hierarchy

The requirement was that if a user selected no regions and one territory, the chart should show the % contribution of each product type for the region in which the territory is located.

PBI Undesired Behavior

Back in the MDX days this would be easy because we could go up the hierarchy (assuming the hierarchy is called Geography) using something like [Geography].[Geography].CurrentMember.Parentor Ancestors([Geography].[Geography].CurrentMember,1).

I miss MDX

DAX doesn’t have this same capability, so you have to approach it a bit differently.

I made a quick demo model to mimic the circumstances.

  • The Geography table contains an ID as well as Region and Territory.
  • The Product table contains and ID and Product Name.
  • The Product Type is actually in the fact/data table along with the Sales Amount.
  • The relationships between the tables are many:1 from the Sales table to the lookup tables, both with cross filter direction set to single.

The ultimate calculation we wanted was <regional product type sales amount> / <region sales amount total for all product types>.

The initial calculation we started with was:

Percent Of Region Total :=
CALCULATE (
 SUM ( 'Sales'[Sales Amount] ),
 ALLSELECTED ( 'Geography'[Region] ),
 ALL ( 'Geography'[Territory] )
)
 / CALCULATE (
 SUM ( 'Sales'[Sales Amount] ),
 ALL ( 'Sales'[Product Type] ),
 ALLSELECTED ( 'Geography'[Region] ),
 ALL ( 'Geography'[Territory] )
 )

Notice that the numerator and denominator are the same, except the denominator overrides the query context for Product Type. This calculation worked fine when a single region was selected and no territories were selected. Once a territory is selected, it limits the scope to just that territory rather than the entire region, which is not desired.

In order to make sure I understood the query and filter context, I added two calculated measures:

TestFilters := IF ( HASONEFILTER ( 'Geography'[Region] )"1""Many" )
TestValues:= IF (HASONEVALUE('Geography'[Region] )"1""Many" )

I put these measures in a pivot table along with my product types and my geography slicers, and I selected no regions and one territory. The TestFilters measure returned “Many”, but the TestValues measure returned “1”. So I had my first clue as to how to find my desired region.

Filters/Values Test

Next I worked on my numerator in my final calculation. I knew my measure would include the sum of Sales Amount, so I started with that and added CALCULATE() to it in order to change the context. I needed to remove the territory context so I started with Calculate(Sum('Sales'[Sales Amount]), ALL('Geography'[Territory]))

Then I needed to group those results by region and filter to only the desired region, which I did using MAXX(VALUES(Geography[Region])). If I selected a single region or a single territory, I would only get one region back in the values. MAXX() was used to retrieve a single result rather than a table. My numerator ended up as:

Region Contr Numerator :=
MAXX (
VALUES ( Geography[Region] ),
CALCULATE ( SUM ( Sales[Sales Amount] ), ALL ( Geography[Territory] ) )
)

This way, the context of the region corresponding to the selected territory is saved outside of the CALCULATE() in the VALUES(‘Geography'[Region]), but it was removed inside the CALCULATE() to provide the region total. This formula works when no territories and one region is selected as well as when no regions and one territory is selected.

The denominator is written similarly with the addition of the ALL(‘Product'[ProductType]) filter inside of the CALCULATE().

Region Contr Denominator =
MAXX (
VALUES ( Geography[Region] ),
CALCULATE (
SUM ( Sales[Sales Amount] ),
ALL ( 'Sales'[Product Type] ),
ALL ( Geography[Territory] )
)
)

To finish up, I created my final calculation and added a bit of error handling for cases when no regions or territories are selected:

Region Avg =
IF (
HASONEVALUE ( 'Geography'[Region] ),
DIVIDE ( [Region Contr Numerator], [Region Contr Denominator] ),
BLANK ()
)

These calculations provided the desired result.

Region Contribution Final

Note that this only works with the cross filter direction set to single in the relationships. Also, originally I had a Product Type attribute in my Product table, used DAX to add it to my Sales table (to replicate the situation we were solving for) , and then forgot and put the Product Type from Product in my visuals. This clearly didn’t work because I didn’t set my context in the calculations using the field from the Product table.

Hopefully this will come in handy if you have some interesting filter/slicer requirements in your reports.

End of the Power BI Updates List

I just finished updating my Power BI updates list for the last time.  I started the list in November as a way to keep track of all the changes. When I first started it, they weren’t posting the updates to the blog or the What’s New sections in the documentation. Now that Microsoft is on a steady cadence of monthly releases and doing a great job of communicating updates, there isn’t much need for this except as a fun data source. It’s also been about a year since Power BI V2 became generally available, so it seems like a good stopping point.

I removed the link to my updates page, but I left the page available for anyone who has it bookmarked and put a note on the page that the list is no longer being updated. I want to leave it in place because I appreciated the comments and the Power BI visuals people made using the list as a data source.

If you would like a copy of it to continue your own list or to use as a data source, you can access the file at http://1drv.ms/21Kjk3f.

BimlScript – Get to Know Your Control Nuggets

This is post #2 of my BimlScript – Get to Know Your Code Nuggets series. To learn about text nuggets, see my first post.  

The next type of BimlScript code nugget I’d like to discuss is the control nugget. Control nuggets allow you to insert control logic to determine what Biml is generated and used to create your BI artifacts (packages, cubes, etc.).  Control nuggets can be used to:

  • Add conditional logic so that specified Biml fragments are only generated when certain criteria are met
  • Add loops to repeat Biml fragments multiple times
  • Define variables that will be used later
  • Access external data from databases, flat files or other sources that will be used in the generation of the final Biml code

Control nuggets start with <# and end with #>. Just like text nuggets, control nuggets can be a single line or multiple lines. And they can contain simple or complex logic.

There were actually control nuggets in the text nugget example from the previous post. The variable declarations at the top of the file (lines 4 – 10) are control nuggets.

Below is another example file that uses code nuggets. I like to separate my design patterns. project connections, and package generation into separate files. The BimlScript shown below is from a slightly altered “caller file”, the file that I would execute to create the SSIS packages that should follow a specified design pattern. This caller file is generating Type 1 slowly changing dimensions.

In my Biml framework, I store the data I need in order to generate my packages in SQL tables. You can see the control nugget in lines 14 – 40 retrieving data from my database and storing it in a data table for later use.

On line 46, there is a control nugget containing a for each loop. For each row in the data table, it calls a BimlScript file that creates a package that uses my Type 1 SCD design pattern with the variables from data table. Notice that the end curly brace for my loop is in a separate control nugget.

There are many different ways to use control nuggets that aren’t covered here.  From these two examples you can start to see how I might combine text nuggets and control nuggets to automate my SSIS package creation and employ consistent design patterns. But there are still a few missing pieces that will be filled in when I cover the remaining code nugget types.