DAX, Microsoft Technologies, Power BI, Power Query

Calculating Age in Power BI

In week 26 of Workout Wednesday for Power BI, I asked people to calculate the age of Nobel laureates at the time they received the award. I provided some logic, but I didn’t prescribe how to create the age calculation. This inspired a couple of questions and a round of data validation as calculating age may be trickier than you think. In this post, I’ll explore some of the ways people have calculated age in Power BI and the edge cases where those calculations may not work.

In my solution video for Workout Wednesday, I used Power Query to calculate age. This was inspired by several blog posts and videos I had seen previously. There is an Age menu option in the Power Query editor under Date.

Calculating Age with the Power Query Editor user interface

When you select a date column and use that Age option, it calculates the duration between the selected date and the current date in days. You must then replace the current date with the second date column. Next you can choose Total Years under Duration, which divides the days by 365. Finally, you must round that number down to the next integer to get years.

If you follow Ruth’s video, you can do all of that in one step that creates a custom column with the final age value.

 Number.RoundDown(Duration.TotalDays([Date2] - [Date1])/365) 

That is the most common option in Power Query as there is no DateDiff function.

There are a few options for calculating age in DAX. Some people use the DATEDIFF function.

Age DateDiff = DATEDIFF([Date1],[Date2],YEAR) 

Another way I have seen is to use YEARFRAC function.

Age YearFrac = INT ( YEARFRAC ( [Date1], [Date2], 1 ) )

The way Marco Russo suggests is to use QUOTIENT.

Age Quotient (DAX): 

Age Quotient = 

VAR Birthdate = [Date1]

VAR ThisDay = [Date2]

VAR IntBirthdate = YEAR ( Birthdate ) * 10000 + MONTH ( Birthdate ) * 100 + DAY ( Birthdate )

VAR IntThisDay = YEAR ( ThisDay ) * 10000 + MONTH ( ThisDay ) * 100 + DAY ( ThisDay )

VAR Age = QUOTIENT ( IntThisDay - IntBirthdate, 10000 )

VAR CheckedAge = DIVIDE ( Age, NOT ISBLANK ( Birthdate ) )

RETURN

    CheckedAge

As Marco points out, many people were using YEARFRAC, but there is a bug in the DAX implementation that causes it to occasionally return an incorrect answer for this purpose.

Checking the Numbers

I created a Power BI file to demonstrate the differences in these four calculations. You can download the file here. The image below displays the results in several tests. For each row, I’m using Date1 as the birthdate and Date2 as the “as of” date. You’ll notice that I focused on leap years for a few cases.

Table in Power BI with 10 date ranges showing the results from the four calculations. 6 of the 10 rows have different results across the calculations.
Example date ranges and result of the four age calculations

There are six of ten date ranges that have different results across the different calculation methods.

In the second row, the Power Query age calculation says that Feb 29 to Feb 28 in the following year is a full year. This may or may not be what you want depending on your requirements. I’m noting the difference so you can be aware. A similar thing occurs in the fifth row going from Feb 29, 2016 to Feb 28, 2020, and again on the 9th row going from March 1, 2019 to Feb 29, 2020.

On the third row, notice that the DAX DATEDIFF function calculates Feb 29 to Feb 27 of the following year to be a full year, despite it being a day or two short. Depending on what you do with leap years, you might consider Feb 29 to Feb 28 in the following year to be a full year, but that third row result means DATEDIFF is probably not the calculation I want. We see a similar result going from March 1 to Feb 28 of the following year.

YEARFRAC calculates that Feb 29 to Feb 28 in the following year is not a full year, which may be desirable. But it counts Feb 29, 2016 to Feb 29, 2020 as only three years. And we see that March 1, 2000 to March 1, 2021 is only counted as 20 years. So even without starting on a leap year, we get some incorrect results. Small numbers seem to be correct until it gets to about 13 years.

Using the QUOTIENT function provides what I consider to be the most correct results. It calculates Feb 29 to Feb 28 of the following year to be less than a year. It calculates Feb 29, 2016 to Feb 28, 2016 to be three years and not four. And it calculates March 1 to Feb 29 of the following year to be less than a year.

Which to use?

The QUOTIENT formula produces the most accurate results if you don’t want Feb 29 to Feb 28 the next year to be counted as a year. DATEDIFF and YEARFRAC produce too many incorrect results for me to ever suggest using them. Since there is a DAX option that produces more correct answers, I would just go for QUOTIENT instead of either of these two.

UPDATE: There is a better alternative! Imke Feldmann reminded me that there is an Number.IntegerDivide function in Power Query. So let’s take the logic from Marco’s DAX calculation and move it to Power Query:

(BirthDate as date, EndDate as date) =>
let
BirthDateInt = Date.Year(BirthDate)10000 + Date.Month(BirthDate)100 + Date.Day(BirthDate),
EndDateInt = Date.Year(EndDate)10000 + Date.Month(EndDate)100 + Date.Day(EndDate),
Age = Number.IntegerDivide((EndDateInt - BirthDateInt),10000)
in Age

The Power Query custom column created by invoking this function should produce better compression than a DAX calculated column. This might not be significant for a small dataset, but we should be efficient when we can.

DAX, Microsoft Technologies, Power BI

DAX Logic and Blanks

A while back I was chatting with Shannon Lindsay on Twitter. She shares lots of useful Power BI tips there. She shared her syntax tip of the & operator being used for concatenation and the && operator being used for boolean AND, which reminded me about implicit conversions and blanks in DAX.

Before you read the below tweet, see how many of these you can guess correctly:

Blank + 5 = ? 
Blank * 5 = ?
5 / Blank = ?
0 / Blank = ?

In DAX, Blank is converted to 0 in addition and subtraction.

What about boolean logic? Do you know the result of the following expressions?

AND(True(), Blank()) = ? 
OR(True(), Blank()) = ? 
AND(False(), Blank()) = ? 
AND(Blank(), Blank()) = ? 

You can see the results as well as a few more permutations in the screenshot below.

Two tables in a Power BI report. The left table shows arithmetic operations involving blanks. For example, Blank + Blank = Blank, 0 * Blank = NaN, 5 * Blank = Blank, 5 / Blank = Infinity. The right table shows boolean operations involving blanks. True and blank = false, true or blank = true, false and blank = false, blank or blank = false
Read the left table as Number1 [operator] Number2, so 5 + Blank = 5. 5 * Blank = Blank. And 5 / Blank = Infinity. Read the right table as Bool1 [operator] Bool2, so True AND Blank = False and True OR Blank = True.

Why does this matter?

You need to understand the impact of blanks in your data. Do you really want to divide by zero when you are missing data? If you are performing a boolean AND, and your data is blank, are you ok with showing a result of False? Remember that your expression may produce undesired results rather than an error.

First, you need to be aware of where it is possible in your data to get a blank input. When you are writing your DAX measures, you may need to handle blanks. DAX offers the IFERROR() function to check if the result of an expression throws an error. There is also an ISBLANK() function that you can use to check for a blank value and a COALESCE() function to provide an alternate value when a blank value is detected.

But adding extra logic in your measures may have a performance impact. For example, the DIVIDE() function can handle divide by zero errors for you. But DIVIDE() may be slower than the / operator. The performance difference is dependent on your data and the expression you are writing. Alternatively, you can use an IF statement to check if an input value is greater than zero using the > operand. This can be quicker than checking for blanks or errors using other functions.

At the end of the day, producing the correct result is more important than fast performance, but we strive to achieve both. If you have any tips for handling blanks in DAX, please share them in the comments.

DAX, Microsoft Technologies, SSAS

DAX Date Dimension and Fun with Date Math

I was working on a SSAS Tabular 2016 solution for a project for which I had no data (an empty data model, but no data). I was not in control of the source data warehouse, so I couldn’t change what I had, but I needed to get started. So I went about creating a date dimension with a fiscal calendar entirely in DAX. There are several scripts out there to create a date dimension. Although there are many nice alternatives for Power BI that are built in M or a combination of M and DAX, I haven’t seen many full date dimensions with a fiscal calendar just using DAX.

I’ll explain some of the DAX calculations below, but I put the TMSL/JSON in Gist so you can easily paste into a SSAS 2016 Tabular project. The JSON has the added benefit of retaining data types, formatting, hidden, and sort by properties.


{
"name": "Date",
"dataCategory": "Time",
"columns": [
{
"type": "calculatedTableColumn",
"name": "Date",
"dataType": "dateTime",
"isNameInferred": true,
"isDataTypeInferred": true,
"isKey": true,
"sourceColumn": "[Date]",
"formatString": "Short Date",
"annotations": [
{
"name": "Format",
"value": "<Format Format=\"DateTimeShortDatePattern\" />"
}
]
},
{
"type": "calculated",
"name": "Month Nbr",
"dataType": "int64",
"isDataTypeInferred": true,
"expression": "month([Date])",
"formatString": "0",
"summarizeBy": "none"
},
{
"type": "calculated",
"name": "DateKey",
"dataType": "int64",
"isDataTypeInferred": false,
"expression": "VALUE(FORMAT([Date],\"YYYYMMDD\"))",
"formatString": "0",
"summarizeBy": "none"
},
{
"type": "calculated",
"name": "Year",
"dataType": "int64",
"isDataTypeInferred": true,
"expression": "YEAR([Date])",
"formatString": "0",
"summarizeBy": "none"
},
{
"type": "calculated",
"name": "YearMonthNum",
"dataType": "string",
"isDataTypeInferred": true,
"isHidden": true,
"expression": "format([Date],\"YYYYMM\")"
},
{
"type": "calculated",
"name": "Month Year Abbrev",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "format([Date],\"mmm YYYY\")",
"sortByColumn": "YearMonthNum"
},
{
"type": "calculated",
"name": "Month Name",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "format([Date],\"mmmm\")",
"sortByColumn": "Month Nbr"
},
{
"type": "calculated",
"name": "Month Abbrev",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "Format([Date],\"mmm\")",
"sortByColumn": "Month Nbr"
},
{
"type": "calculated",
"name": "Day Of Week Nbr",
"dataType": "int64",
"isDataTypeInferred": true,
"expression": "Weekday([Date])",
"formatString": "0",
"summarizeBy": "none"
},
{
"type": "calculated",
"name": "Day Of Week",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "Format([date],\"dddd\")",
"sortByColumn": "Day Of Week Nbr"
},
{
"type": "calculated",
"name": "Weekday Abbrev",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "left([Day Of Week],3)",
"sortByColumn": "Day Of Week Nbr"
},
{
"type": "calculated",
"name": "Quarter Nbr",
"dataType": "double",
"isDataTypeInferred": true,
"expression": "VALUE(format([date],\"Q\"))",
"formatString": "0",
"summarizeBy": "none"
},
{
"type": "calculated",
"name": "Quarter",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "\"Q\" & [Quarter Nbr]",
"sortByColumn": "Quarter Nbr"
},
{
"type": "calculated",
"name": "Quarter Year",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "[Quarter] & \" \" & [Year]",
"sortByColumn": "YrQtrSort"
},
{
"type": "calculated",
"name": "YrQtrSort",
"dataType": "double",
"isDataTypeInferred": true,
"isHidden": true,
"expression": "[year]*10+[Quarter Nbr]"
},
{
"type": "calculated",
"name": "Day of Month",
"dataType": "int64",
"isDataTypeInferred": true,
"expression": "day([date])",
"formatString": "0",
"summarizeBy": "none"
},
{
"type": "calculated",
"name": "Fiscal Month Nbr",
"dataType": "int64",
"isDataTypeInferred": true,
"expression": "month(edate([date],[Month Shift]))",
"formatString": "0",
"summarizeBy": "none"
},
{
"type": "calculated",
"name": "Fiscal Month",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "\"FM \" & [Fiscal Month Nbr]",
"sortByColumn": "Fiscal Month Nbr"
},
{
"type": "calculated",
"name": "Fiscal Year Nbr",
"dataType": "int64",
"isDataTypeInferred": true,
"expression": " if( [Month Nbr] >= VALUE(format(Date([Year],[Fiscal Year Month Begin],1),\"M\")), [Year]+1, [Year])",
"formatString": "0",
"summarizeBy": "none"
},
{
"type": "calculated",
"name": "Fiscal Year",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "\"FY \" & [Fiscal Year Nbr]",
"sortByColumn": "Fiscal Year Nbr"
},
{
"type": "calculated",
"name": "FiscalMonthSort",
"dataType": "int64",
"isDataTypeInferred": true,
"isHidden": true,
"expression": " [Fiscal Year Nbr]*100+[Fiscal Month Nbr]"
},
{
"type": "calculated",
"name": "Fiscal Month Year",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "[Month Abbrev] & \" \" & [Fiscal Year Nbr]",
"sortByColumn": "FiscalMonthSort"
},
{
"type": "calculated",
"name": "Fiscal Quarter Nbr",
"dataType": "double",
"isDataTypeInferred": true,
"expression": "ROUNDUP(MONTH(EDATE([Date],[Month Shift]))/3,0)",
"formatString": "0",
"summarizeBy": "none"
},
{
"type": "calculated",
"name": "Fiscal Quarter",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "\"Q\" & [Fiscal Quarter Nbr]",
"sortByColumn": "Fiscal Quarter Nbr"
},
{
"type": "calculated",
"name": "Fiscal Quarter Year",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "[Fiscal Quarter] & \" \" & [Fiscal Year Nbr]",
"sortByColumn": "Fiscal Quarter Sort"
},
{
"type": "calculated",
"name": "End Of Month",
"dataType": "dateTime",
"isDataTypeInferred": true,
"expression": "EOMONTH([date],0)",
"formatString": "Short Date",
"annotations": [
{
"name": "Format",
"value": "<Format Format=\"DateTimeShortDatePattern\" />"
}
]
},
{
"type": "calculated",
"name": "End Of Quarter",
"dataType": "dateTime",
"isDataTypeInferred": true,
"expression": "if([Month Nbr]<4,date([Year],3,31),if([Month Nbr]<7,DATE([Year],6,30),if([Month Nbr]<10,Date([Year],9,30),Date([Year],12,31))))",
"formatString": "Short Date",
"annotations": [
{
"name": "Format",
"value": "<Format Format=\"DateTimeShortDatePattern\" />"
}
]
},
{
"type": "calculated",
"name": "End of Year",
"dataType": "dateTime",
"isDataTypeInferred": true,
"expression": "date([Year],12,31)",
"formatString": "Short Date",
"annotations": [
{
"name": "Format",
"value": "<Format Format=\"DateTimeShortDatePattern\" />"
}
]
},
{
"type": "calculated",
"name": "End of Fiscal Year",
"dataType": "dateTime",
"isDataTypeInferred": true,
"expression": "EOMONTH(Date([Fiscal Year Nbr],[Fiscal Year Month Begin],1),-1)",
"formatString": "Short Date",
"annotations": [
{
"name": "Format",
"value": "<Format Format=\"DateTimeShortDatePattern\" />"
}
]
},
{
"type": "calculated",
"name": "Fiscal Quarter Sort",
"dataType": "double",
"isDataTypeInferred": true,
"isHidden": true,
"expression": "[year]*10+[Fiscal Quarter Nbr]"
},
{
"type": "calculated",
"name": "Calendar Week Nbr",
"dataType": "int64",
"isDataTypeInferred": true,
"expression": "WEEKNUM([Date],1)",
"summarizeBy": "none"
},
{
"type": "calculated",
"name": "Begin of Month",
"dataType": "dateTime",
"isDataTypeInferred": true,
"expression": "date([Year],[Month Nbr],1)",
"formatString": "Short Date",
"annotations": [
{
"name": "Format",
"value": "<Format Format=\"DateTimeShortDatePattern\" />"
}
]
},
{
"type": "calculated",
"name": "Begin of Quarter",
"dataType": "dateTime",
"isDataTypeInferred": true,
"expression": "if([Month Nbr]<4,date([Year],1,1),if([Month Nbr]<7,DATE([Year],4,1),if([Month Nbr]<10,Date([Year],7,1),Date([Year],10,1))))",
"formatString": "Short Date",
"annotations": [
{
"name": "Format",
"value": "<Format Format=\"DateTimeShortDatePattern\" />"
}
]
},
{
"type": "calculated",
"name": "Begin of Year",
"dataType": "dateTime",
"isDataTypeInferred": true,
"expression": "date([Year],1,1)",
"formatString": "Short Date",
"annotations": [
{
"name": "Format",
"value": "<Format Format=\"DateTimeShortDatePattern\" />"
}
]
},
{
"type": "calculated",
"name": "Begin of Fiscal Quarter",
"dataType": "dateTime",
"isDataTypeInferred": true,
"expression": [
"if([Fiscal Quarter Nbr]=1,[Begin of Fiscal Year],",
"if([Fiscal Quarter Nbr] = 2, EDATE([Begin of Fiscal Year],3),",
"if([Fiscal Quarter Nbr] = 3, EDATE([Begin of Fiscal Year],6),",
"EDATE([Begin of Fiscal Year],9))))"
],
"formatString": "Short Date",
"annotations": [
{
"name": "Format",
"value": "<Format Format=\"DateTimeShortDatePattern\" />"
}
]
},
{
"type": "calculated",
"name": "Begin of Fiscal Year",
"dataType": "dateTime",
"isDataTypeInferred": true,
"expression": "date([Fiscal Year Nbr]-1,[Fiscal Year Month Begin],1)",
"formatString": "Short Date",
"annotations": [
{
"name": "Format",
"value": "<Format Format=\"DateTimeShortDatePattern\" />"
}
]
},
{
"type": "calculated",
"name": "Fiscal Week Nbr",
"dataType": "int64",
"isDataTypeInferred": true,
"expression": "INT(([Date]-Weekday([Date]+1)-Date('Date'[Year]-([Month Nbr]<[Fiscal Year Month Begin]),[Fiscal Year Month Begin],1))/7)+2",
"formatString": "0",
"summarizeBy": "none"
},
{
"type": "calculated",
"name": "Calendar Week",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "\"W\"&[Calendar Week Nbr]",
"sortByColumn": "Calendar Week Nbr"
},
{
"type": "calculated",
"name": "Fiscal Week",
"dataType": "string",
"isDataTypeInferred": true,
"expression": "\"FW\"&[Fiscal Week Nbr]",
"sortByColumn": "Fiscal Week Nbr"
},
{
"type": "calculated",
"name": "Month Shift",
"dataType": "int64",
"isDataTypeInferred": true,
"isHidden": true,
"expression": "13 – [Fiscal Year Month Begin]"
},
{
"type": "calculated",
"name": "Fiscal Year Month Begin",
"dataType": "int64",
"isDataTypeInferred": true,
"isHidden": true,
"expression": "10"
},
{
"type": "calculated",
"name": "MonthDiff",
"dataType": "int64",
"isDataTypeInferred": true,
"isHidden": true,
"expression": "DateDiff([Begin of Fiscal Year],[Date],MONTH)"
},
{
"type": "calculated",
"name": "End of Fiscal Quarter",
"dataType": "dateTime",
"isDataTypeInferred": true,
"expression": [
"if([Fiscal Quarter Nbr]=1,EDATE([Begin of Fiscal Year],3)-1,",
"if([Fiscal Quarter Nbr] = 2, EDATE([Begin of Fiscal Year],6)-1,",
"if([Fiscal Quarter Nbr] = 3, EDATE([Begin of Fiscal Year],9)-1,",
"EDATE([Begin of Fiscal Year],12)-1)))",
""
],
"formatString": "Short Date",
"annotations": [
{
"name": "Format",
"value": "<Format Format=\"DateTimeShortDatePattern\" />"
}
]
}
],
"partitions": [
{
"name": "CalculatedTable 1",
"source": {
"type": "calculated",
"expression": "Calendar(DATE(2010,1,1),DATE(2025,12,31))"
}
}
]
}

The fiscal calendar assumes calendar months with the fiscal calendar shifted by some number of months. It also assumes that the fiscal year is ahead of the calendar year. That is, fiscal year 2017 starts in the first day of some month in 2016. In my example script, the fiscal year starts in October. If you have some 4-4-5, 4-5-4, or other calendar, the fiscal calendar calculations in this script won’t help you.

To add this date dimension to your SSAS Tabular project:

  1. Right click on the Model.bim file in the Solution Explorer and choose View Code.
  2. If you have at least one table in the model already, locate the end of the definition of the previous table, inside the table array. Add a comma after the end curly brace for the previous table.
  3. Paste the JSON/TMSL from my Gist.
  4. Save and close the file.
  5. Right click on Model.Bim and choose View Designer.

TMSL

To adjust the date range included in the Date table, locate the [Date] field. The current calculation is:

=CALENDAR(DATE(2010,1,1),DATE(2025,12,31))

The current date range goes from 1 Jan 2010 through 31 Dec 2025.

To adjust the start of the fiscal year, change the expression in the [Fiscal Year Month Begin] field. The current calculation is

=10

This field represents the calendar month number in which the fiscal year begins. If your fiscal calendar begins in July, you’ll want to change the 10 to 7.

You shouldn’t have to adjust anything else in the date dimension to get correct values.

Calculation Explanations

I used the FORMAT() function to get a lot of the calendar fields as well as the [DateKey] field.

DateKey = VALUE(FORMAT([Date],"YYYYMMDD"))

Month Year Abbrev = FORMAT([Date],"mmm YYYY")

Month Name = FORMAT([Date],"mmmm")

Day of Week = FORMAT([Date],"dddd")

Qtr Nbr = VALUE(FORMAT([Date],"Q"))

Many of the fiscal calendar fields use the EDATE() to shift date calculations to the correct month. The [Fiscal Year Nbr] field uses the [Fiscal Year Month Begin] field to ensure the correct year is shown.

Month Shift = 13 - [Fiscal Year Month Begin]

Fiscal Month Nbr = MONTH(EDATE([Date],[Month Shift]))

Fiscal Quarter Nbr = ROUNDUP(MONTH(EDATE([Date],[Month Shift]))/3,0)

Fiscal Year Nbr = IF([Month Nbr] >= VALUE(FORMAT(Date([Year],
[Fiscal Year Month Begin],1),"M")), [Year]+1, [Year])

I also like to include the beginning and end dates for month, quarter, and year.

Begin of Month = DATE([Year],[Month Nbr],1)

End of Month = EOMONTH([Date],0)

Begin of Quarter = IF([Month Nbr]<4,DATE([Year],1,1),
IF([Month Nbr]<7,DATE([Year],4,1),
IF([Month Nbr]<10,DATE([Year],7,1),
DATE([Year],10,1))))

End of Quarter = IF([Month Nbr]<4,DATE([Year],3,31),
IF([Month Nbr]<7,DATE([Year],6,30),
IF([Month Nbr]<10,DATE([Year],9,30),
DATE([Year],12,31))))

Begin of Year = DATE([Year],1,1)

End of Year = DATE([Year],12,31)

DAX has a nice EOMONTH() function to get the end of the month now. There are multiple ways to get end of quarter. I used an if statement. You could also use SWITCH(). I also included begin and end dates for the fiscal quarter and year (end of month is the same since my calendar assumes fiscal months are calendar months).

Begin of Fiscal Year = DATE([Fiscal Year Nbr]-1,[Fiscal Year Month Begin],1)

End of Fiscal Year = EOMONTH(
Date([Fiscal Year Nbr],[Fiscal Year Month Begin],1),-1)

Begin of Fiscal Quarter = IF([Fiscal Quarter Nbr]=1,[Begin of Fiscal Year],
IF([Fiscal Quarter Nbr] = 2, EDATE([Begin of Fiscal Year],3),
IF([Fiscal Quarter Nbr] = 3, EDATE([Begin of Fiscal Year],6),
EDATE([Begin of Fiscal Year],9))))

End of Fiscal Quarter = IF([Fiscal Quarter Nbr]=1,
EDATE([Begin of Fiscal Year],3)-1,
IF([Fiscal Quarter Nbr] = 2, EDATE([Begin of Fiscal Year],6)-1,
IF([Fiscal Quarter Nbr] = 3, EDATE([Begin of Fiscal Year],9)-1,
EDATE([Begin of Fiscal Year],12)-1)))

The EOMONTH() function allows you to shift forward or backward from the current selected month. For the end of the fiscal year, I went to the beginning of the next fiscal year, subtracted a month, and then found the end of that month. The end of fiscal quarter calculation follows similar logic get the month that begins the next quarter and subtract one day.

 

DAX, Microsoft Technologies, Power BI, Power Pivot

Using Context To Traverse Hierarchies In DAX

My friend and coworker Melissa Coates (aka @sqlchick) messaged me the other day to see if I could help with a DAX formula. She had a Power BI dashboard in which she needed a very particular interaction to occur. She had slicers for geographic attributes such as Region and Territory, in addition to a chart that showed the percent of the regional total that each product type represented.  The product type was in the fact/data table. Region and territory were in a dimension/lookup table and formed a hierarchy where a region was made up of one or more territories and each territory had only one region.Territory Hierarchy

The requirement was that if a user selected no regions and one territory, the chart should show the % contribution of each product type for the region in which the territory is located.

PBI Undesired Behavior

Back in the MDX days this would be easy because we could go up the hierarchy (assuming the hierarchy is called Geography) using something like [Geography].[Geography].CurrentMember.Parentor Ancestors([Geography].[Geography].CurrentMember,1).

I miss MDX

DAX doesn’t have this same capability, so you have to approach it a bit differently.

I made a quick demo model to mimic the circumstances.

  • The Geography table contains an ID as well as Region and Territory.
  • The Product table contains and ID and Product Name.
  • The Product Type is actually in the fact/data table along with the Sales Amount.
  • The relationships between the tables are many:1 from the Sales table to the lookup tables, both with cross filter direction set to single.

The ultimate calculation we wanted was <regional product type sales amount> / <region sales amount total for all product types>.

The initial calculation we started with was:

Percent Of Region Total :=
CALCULATE (
 SUM ( 'Sales'[Sales Amount] ),
 ALLSELECTED ( 'Geography'[Region] ),
 ALL ( 'Geography'[Territory] )
)
 / CALCULATE (
 SUM ( 'Sales'[Sales Amount] ),
 ALL ( 'Sales'[Product Type] ),
 ALLSELECTED ( 'Geography'[Region] ),
 ALL ( 'Geography'[Territory] )
 )

Notice that the numerator and denominator are the same, except the denominator overrides the query context for Product Type. This calculation worked fine when a single region was selected and no territories were selected. Once a territory is selected, it limits the scope to just that territory rather than the entire region, which is not desired.

In order to make sure I understood the query and filter context, I added two calculated measures:

TestFilters := IF ( HASONEFILTER ( 'Geography'[Region] )"1""Many" )
TestValues:= IF (HASONEVALUE('Geography'[Region] )"1""Many" )

I put these measures in a pivot table along with my product types and my geography slicers, and I selected no regions and one territory. The TestFilters measure returned “Many”, but the TestValues measure returned “1”. So I had my first clue as to how to find my desired region.

Filters/Values Test

Next I worked on my numerator in my final calculation. I knew my measure would include the sum of Sales Amount, so I started with that and added CALCULATE() to it in order to change the context. I needed to remove the territory context so I started with Calculate(Sum('Sales'[Sales Amount]), ALL('Geography'[Territory]))

Then I needed to group those results by region and filter to only the desired region, which I did using MAXX(VALUES(Geography[Region])). If I selected a single region or a single territory, I would only get one region back in the values. MAXX() was used to retrieve a single result rather than a table. My numerator ended up as:

Region Contr Numerator :=
MAXX (
VALUES ( Geography[Region] ),
CALCULATE ( SUM ( Sales[Sales Amount] ), ALL ( Geography[Territory] ) )
)

This way, the context of the region corresponding to the selected territory is saved outside of the CALCULATE() in the VALUES(‘Geography'[Region]), but it was removed inside the CALCULATE() to provide the region total. This formula works when no territories and one region is selected as well as when no regions and one territory is selected.

The denominator is written similarly with the addition of the ALL(‘Product'[ProductType]) filter inside of the CALCULATE().

Region Contr Denominator =
MAXX (
VALUES ( Geography[Region] ),
CALCULATE (
SUM ( Sales[Sales Amount] ),
ALL ( 'Sales'[Product Type] ),
ALL ( Geography[Territory] )
)
)

To finish up, I created my final calculation and added a bit of error handling for cases when no regions or territories are selected:

Region Avg =
IF (
HASONEVALUE ( 'Geography'[Region] ),
DIVIDE ( [Region Contr Numerator], [Region Contr Denominator] ),
BLANK ()
)

These calculations provided the desired result.

Region Contribution Final

Note that this only works with the cross filter direction set to single in the relationships. Also, originally I had a Product Type attribute in my Product table, used DAX to add it to my Sales table (to replicate the situation we were solving for) , and then forgot and put the Product Type from Product in my visuals. This clearly didn’t work because I didn’t set my context in the calculations using the field from the Product table.

Hopefully this will come in handy if you have some interesting filter/slicer requirements in your reports.

DAX, Microsoft Technologies, SSAS, Uncategorized

Trekking through the DAX Jungle In Search of Lost Customers

I like to think I’m proficient at writing DAX and building SSAS tabular models. I enjoy a good challenge and appreciate requirements that cause me to stretch and learn. But sometimes I hit a point where I realize I must go for help because I’m not going to complete this challenge in a timely manner on my own.  I think this is part of how most IT professionals operate.

This, but replace adultier adult with DAX-ier (more knowledgeable?) BI Consultant

Luckily, I am part of a great SQL Server community, and I also work with some really smart people.

I was testing some calculations in my tabular model when I noticed that my Lost Customers calculation wasn’t working as expected. This was rather interesting to me since the calculation I was using was from DAXPatterns.com.  After some experimentation, I determined that the difference between my data situation, and the situation that fit the DAX Pattern was that my customer dimension was a Type 2 Slowly Changing Dimension. That meant I couldn’t use the customer key to identify a unique customer, since each customer could have multiple customer keys  (surrogate keys) with the same customer number (business key). Specifically, if a customer made a purchase in December 2015, then changed their name, then made a purchase in January and February, my calculation was counting them as lost because it was not recognizing that the customer with the new name and same customer number was actually the same customer.

My original calculation that didn't properly handle Type 2 SCDs:
 Lost Customers :=
 IF (
     NOT (
         MIN ( 'Date'[Full Date] )
             CALCULATE ( MAX ( Sales[Invoice Date] )ALL ( Sales ) )
     ),
     COUNTROWS (
         FILTER (
             ADDCOLUMNS (
                 FILTER (
                     CALCULATETABLE (
                         ADDCOLUMNS (
                             VALUES ( Sales[CustomerNo] ),
                             "CustomerLostDate"                             CALCULATE ( MAX ( Sales[Invoice Date] ) )
                             + [Lost Days Limit]
                         ),
                         FILTER (
                             ALL ( 'Date' ),
                             AND (
                                 'Date'[Full Date] < MIN ( 'Date'[Full Date] ),
                                 'Date'[Full Date]
                                     >= MIN ( 'Date'[Full Date] ) 
                                         - [Lost Days Limit]
                             )
                         )
                     ),
                    AND (
                         AND (
                             [CustomerLostDate] >= MIN ( 'Date'[Full Date] ),
                             [CustomerLostDate] <= MAX ( 'Date'[Full Date] )
                         ),
                      [CustomerLostDate] <= 
                       CALCULATE (MAX ( Sales[Invoice Date] )ALL ( Sales ) )
                     )
                 ),
                 "FirstBuyInPeriod"CALCULATE ( MIN ( Sales[Invoice Date] ) )
             ),
             OR ( 
                 ISBLANK ( [FirstBuyInPeriod] ), 
                 [FirstBuyInPeriod] > [CustomerLostDate] 
              )
         )
     )
 )

Having put forth a good effort on my own and not wanting to go down the rabbit hole for several days, I decided to post my issue to the Power BI forums where Marco Russo (one of the authors of DAXPatterns.com) was able to help me.
Based upon his suggestions, my final calculation ended up as shown below. Only the bolded part (the innermost ADDCOLUMNS function) was changed

Lost Customers :=
 IF (
     NOT (
         MIN ( 'Date'[Full Date] )
             CALCULATE ( MAX ( Sales[Invoice Date] )ALL ( Sales ) )
     ),
     COUNTROWS (
         FILTER (
             ADDCOLUMNS (
                 FILTER (
                     CALCULATETABLE (
                         ADDCOLUMNS (
                              CALCULATETABLE ( 
                                 VALUES ( Customer[Customer No] ), Sales ),
                                 "CustomerLostDate"CALCULATE (
                                     MAX ( Sales[Invoice Date] ),
                                     ALLEXCEPT ( Customer, Customer[Customer No] )
                              )
                                  + [Lost Days Limit]
                          ),
                         FILTER (
                             ALL ( 'Date' ),
                             AND (
                                 'Date'[Full Date] < MIN ( 'Date'[Full Date] ),
                                 'Date'[Full Date] >= 
                                     MIN ( 'Date'[Full Date] ) - [Lost Days Limit]
                             )
                         )
                     ),
                     AND (
                         AND (
                             [CustomerLostDate] >= MIN ( 'Date'[Full Date] ),
                             [CustomerLostDate] <= MAX ( 'Date'[Full Date] )
                         ),
                         [CustomerLostDate] <=  
                            CALCULATE ( MAX ( Sales[Invoice Date] )ALL ( Sales )                            )
                     )
                 ),
                 "FirstBuyInPeriod"CALCULATE ( MIN ( Sales[Invoice Date] ) )
             ),
             OR ( 
                ISBLANK ( [FirstBuyInPeriod] ), 
                [FirstBuyInPeriod] > [CustomerLostDate] 
             )
         )
     )
 )

In more human-friendly terms, I think this calculation does the following:

  • Create a table of customer numbers that have sales, along with their customer lost date, which is defined as the max invoice date for that customer number after removing all context filters on the customer table except for customer number + the number returned by Lost Days Limit (in my case, 60)
  • Filter the dates on the calculated table such that the [min invoice date minus 60] is less than the selected date which is less than and the minimum invoice date
  • Filter the dates on the calculated table such that the customer lost date is between the minimum invoice date and the maximum invoice date selected and is less than the max invoice date for all sales.
  • Add a column called FirstBuyInPeriod that contains the min invoice date for the selected dates
  • Filter the calculated table such that the minimum invoice date in the selected dates is blank or is greater than the customer lost date

This worked fine and gave me the answer I was looking for, but was (as noted on the DAX Patterns page) rather computing-intensive.

My coworker, Javier Guillen offered a great alternative to Marco’s pattern.

First Date Selected :=
 FIRSTDATE ( DATEADD ( 'Date'[Full Date], 0MONTH ) )
Last Date Selected :=
 LASTDATE ( DATEADD ( 'Date'[Full Date], 0MONTH ) )
Customer Lost Date :=
 CALCULATE (
     MAX ( Sales[Invoice Date] ),
     CALCULATETABLE (
         Sales,
         FILTER (
             ALL ( Customer ),
             Customer[Customer No] = MAX ( Customer[Customer No] )
         ),
         FILTER ( ALL ( 'Date' ), 'Date'[Full Date] <= MAX ( 'Date'[Full Date] ) )
     )
 )
     + [Lost Days Limit]
Lost Customer :=
 SUMX (
     VALUES ( Customer[Customer No] ),
     IF (
         [Customer Lost Date] >= [First Date Selected]
             && [Customer Lost Date] <= [Last Date Selected],
         1
     )
 )

These calculations do the following:

  • First Date Selected: Calculate the minimum date of the dates selected
  • Last Date Selected: Calculate the maximum date of the dates selected
  • Customer Lost Date: Create a table based upon the Sales table that returns the max invoice date + [Lost Days Limit]  with a filter on customer to get the max customer number for each customer record (this is how we alleviate the type 2 SCD issue) and a filter on dates to get dates less than the max selected date
  • Lost Customer: For each customer number, count 1 if the customer lost date is between First Date Selected and Last Date Selected, otherwise count 0. Return the sum.

Javier’s suggestion is less resource intensive and returns results more quickly. But it requires that the date range you choose be contiguous. This means that I can’t create a pivot table and choose Jan and February of 2015 and January and February of 2016 and expect to get a correct answer. That is kind of an edge case situation that might not even be an issue, but I already had a situation for my project where YOY analysis of customer counts was needed, so I opted to go with Marco’s suggestion for now. If you don’t have this use case, I think Javier’s calculations are much easier to understand and implement. You could probably put some conditions around it to blank out if the dates are not contiguous and it would be ready to go.

It was cool to find something that wasn’t already covered by a common DAX pattern and wasn’t frequently blogged about, and I’m grateful for the insight from the solutions offered by both Marco and Javier.

DAX, Microsoft Technologies, Power BI, SSAS, Uncategorized

Creating a Matrix in Power BI With Multiple Values on Rows

This week I was asked to create a matrix in a Power BI report that looks like this:

Matrix with Values on Rows
Matrix with Values on Rows (numbers faked to protect the innocent)

To my surprise, Power BI only lets you put multiple values on columns in a matrix. You can’t stack metrics vertically. Note: this is true as of 8 Jan 2016 but may change in the future. If you agree that this should be a feature in Power BI, please make your voice heard and vote for this idea on the Power BI forum and encourage others to vote for it as well.

Since I needed to finish this report, I got a little creative with a disconnected table and some DAX to implement a workaround in my tabular model.

The Disconnected Table

I created a table called Row Order in my tabular model using the following SQL Query as a source.

select 1 as [Row Order], 'Lost Customers' as [Customer Metric]
union all
select 2 as [Row Order], 'Inactive Customers' as [Customer Metric]
union all
select 3 as [Row Order], 'New Customers' as [Customer Metric]

I set the Sort By Column property on the Customer Metric column to Row Order and hid the Row Order column.

Calculated Measures

I already had the base metrics for my matrix:

  • Lost Customers – Week (customers considered lost in the current week, based upon the selected date)
  • Lost Customers – Month (customers considered lost in the current month, based upon the selected date)
  • Lost Customers – Week (customers considered lost in the current month, based upon the selected date)
  • Inactive Customers – Week (customers considered to have moved to inactive status in the current week, based upon the selected date)
  • Inactive Customers – Month (customers considered to have moved to inactive status in the current month, based upon the selected date)
  • Inactive Customers – Quarter (customers considered to have moved to inactive status in the current quarter, based upon the selected date)
  • New Customers – Week (customers who have made their first purchase in the current week, based upon the selected date)
  • New Customers – Month (customers who have made their first purchase in the current month, based upon the selected date)
  • New Customers – Quarter (customers who have made their first purchase in the current quarter, based upon the selected date)

As a side note, when creating time-based measures I prefer to create the measure such that it accepts a date and then filters off of that rather than assuming everyone wants to see the current week/month/quarter. I then add an Is Today field on my date table so if someone does want to see the current week they easily select that in a filter on this field. This means that I can see lost customers for any week, not just the current. Maybe I want to filter my dashboard to the end of December. If I had just created a measure that assumed I wanted last month, I wouldn’t be able to just change my date filter and get the data I want.  As another example, lots of people create a metric to calculate sales in the last 6 weeks. But many people assume they want the last 6 weeks from today rather than allowing the user to choose a date and then calculating sales in the 6 weeks leading up to that day. I much prefer the more flexible calculation.

But having this flexibility in my calculations meant that I couldn’t just put some calculated column for time period in my date dimension. And even if I had made the assumption that I wanted to see the data as of today, it would probably be a separate table or a many-to-many relationship (e.g., 5 Jan 2016 would be in the last week, month, and year).

With my new disconnected table added, I just needed to add a few more measures. First, I needed something to check which customer metric I should be showing:
RowOrder:=Max('Row Order'[Row Order])

I hid the RowOrder measure. Then I created one measure for each time frame (week/month/quarter) that would select the correct measure based upon the Customer Metric on that row.
Weekly Customer:=IF(HASONEVALUE(Row Order[Row Order]), SWITCH([RowOrder],1,[Lost Customers - Week], 2,[Inactive Customers - Week], 3,[New Customers - Week],BLANK()), BLANK())
Monthly Customers:=IF(HASONEVALUE(Row Order[Row Order]), SWITCH([RowOrder],1,[Lost Customers - Month], 2,[Inactive Customers - Month], 3,[New Customers - Month],BLANK()), BLANK())
Quarterly Customers:=IF(HASONEVALUE(Row Order[Row Order]), SWITCH([RowOrder],1,[Lost Customers - Quarter], 2,[Inactive Customers - Quarter], 3,[New Customers - Quarter],BLANK()), BLANK())

Power BI Matrix

I opened my report on Power BI.com and created my matrix.

I put Customer Metric in Rows and Weekly Customers, Monthly Customers, and Quarter Customers in Values. Then I filtered my table to As of Today = “As Of Today”.

PBI Matrix
Fields List for Power BI Matrix

That gives me the matrix I wanted to create. I don’t love that I had to add the extra table, but it works for now until I can achieve this natively with Power BI.

Power BI Matrix
Power BI Matrix using disconnected table and calculated measures (numbers changed to protect the innocent)

 

DAX, Excel, Microsoft Technologies, Power BI, Power Pivot

Improving Performance in Excel and Power View Reports with a Power Pivot Data Source

On a recent project at work, I ran into some performance issues with reports in Excel built against a Power Pivot model. I had 2 Power Views and 2 Excel pivot table reports, of which both Excel reports were slow and one Power View was slow. I did some research on how to improve performance to make the reports return results more quickly, and I found some great resources that I’ve compiled here in the hopes that they help you as well.

Purposeful Sorting of Source Queries

Alberto Ferrari suggests that you check the sorting and design of the tables you import into Power Pivot.  It seems that sort order does not significantly affect the final size of the data in Power Pivot, but it can affect performance when querying the model. I believe the goal is to order the columns based upon the number of distinct values ascending. I didn’t see a measurable improvement in querying my Power Pivot model after implementing this, but the benefits realized probably depend a lot on the type and size of your data.

Power Pivot Data Source Tips

André Van de Graaf has a great list of Power Pivot tips on his blog.  My favorites are:

  • Use views to import data from a database.
  • Import only the columns that are useful and analytically relevant.
  • Reduce the number of rows to import by including only those you are going to analyze (e.g.: perhaps you only need the last two years of data instead of five).

I used database views to specify only the columns and rows I needed in addition to performing any calculations that could be done on the row-level data.  Putting the calculation in the SQL view is often more efficient (and easier for DAX beginners) than creating a calculated column in a Power Pivot table.

Query Performance and Slicers

Power Pivot Pro has a good blog post about report performance when using slicers.  It includes some great tips:

  • Avoid using slicers for fields with many distinct values or fields that come from very large (usually fact/measures) tables.
  • Reduce the number of slicers.  If you can’t reduce the number of slicers, consider turning off cross-filtering on the slicers.
  • Reduce the number and complexity of measures.

It turns out that slicers issue 2 queries each: one to get the list of values and one to determine if those values have related data in the connected pivot table(s).  Slicers are a wonderful usability improvement for report consumers, and I think you should use them in place of report filters whenever it makes sense.  But they can slow performance to an unacceptable level. This is why you may need to remove a couple of slicers from a report that has several. Or you might just turn off cross-filtering.  Since the query for the pivot table basically has to be executed to determine which of the slicer values have related data, simplifying the measures and calcs used in your pivot table helps as well.

My Power Pivot model contained sales data by product. Those products were segmented into 3 product categories.  I had a product category slicer in my report for which I turned off cross-filtering.  With only 3 values, the cross-filtering wasn’t as valuable as the performance improvement achieved by turning it off.

Performance Tuning of Tabular Models

I also re-read the white paper on performance tuning SSAS tabular. Some of the suggestions weren’t applicable since I didn’t control the server where the Power Pivot model sits (my workbook was stored in Office 365). But the tips on changing the DAX calculations and the model are helpful:

  • Convert calculations into calculated columns.
  • Avoid error handling functions.
  • Detect filters using optimized functions like HASONEVALUE() and HASONEFILTER() rather than COUNTROWS().
  • Use CALCULATE() instead of FILTER().
  • Decide whether to use ISBLANK() versus ISEMPTY().
  • Use natural hierarchies, if possible.

Calculated columns are computed when the model is processed (refreshed) whereas calculated measures are computed during query execution. So if your queries are slow, you may want to attempt to use a calculated column to simplify a complex calculated measure, which should improve query execution times.

I used HASONEVALUE() in several calculations that dealt with time intelligence. We did a lot of year over year comparisons, so I had formulas to compare a 4 week period to the same 4 week period in the prior year based upon the fiscal calendar.

Qty Sold Last 4 Wks Prior Yr:=if(HASONEVALUE('End Date'[FiscalWeekKey]),
 CALCULATE([Total Quantity Sold], FILTER (ALL ('End Date' ),
 'End Date'[DaySeq] <= MAX ( 'End Date'[dayseqlastyear]) 
 && 'End Date'[dayseq] > Max('End Date'[dayseqlastyear]) - 28)),BLANK())

HasOneValue checks to make sure the row or column in the pivot table where this calculation is used includes only 1 week so that I can be sure I have the correct 4 week period.

This project reinforced the fact that DAX CALCULATE() function is very powerful and usually much faster than FILTER(). I had a few calculations that started out taking minutes to return for a model that had one fact table with about 8 million rows and 3 small dimension tables.  I rewrote the calculations to remove the FILTER() function and just use CALCULATE() with the same criteria in the filter clause, and they returned the correct values in under a second.

Conclusion

In the end, I used a combination of these tips for my project.  The changes that had the biggest impact were:

  • I reduced the number of rows I was importing.  I originally was pulling all data from 1/1/2010 forward, but I changed it to show the current calendar year plus the previous 3 years.
  • I turned off cross-filtering on some of my slicers.
  • I rewrote some calculated measures to use CALCULATE and remove FILTER()

As PowerPivotPro said almost 3 years ago, Slicer cross filtering is very often the #1 source of slowdown. It is still true today, and it will be one of the first places I look when I am faced with performance issues in Excel reports in the future.

Data Visualization, DAX, Excel, Microsoft Technologies, Power BI, Power Pivot, Power View

Color Coding Values in Power View Maps Based Upon Positive/Negative Sign

Power View can be a good tool for interactive data visualization and data discovery, but it has a few limitations with its mapping capabilities.

  • Power View only visualizes data on maps using bubbles/pies.
  • The size of the bubble on a Power View map can be misleading during analysis for data sets with small values and negative values.
  • By default, Power View can only display one measure on a map, which it uses to determine the size of the bubble.

Microsoft offers other mapping capabilities in Power Map and SSRS maps, so if all you need is a map, you can explore those other tools to determine if they better meet your needs. Check out my presentation on Choosing Your Geospatial Mapping Tool for more info on requirements, capabilities, and limitations of maps in Power View, Power Maps, and SSRS. If you want to see a map in the context of other charts, and you want the interactive highlighting and filtering of Power View, you may look for workarounds to the second and third issues listed above.

I recently had a client express disappointment about the display of negative values in Power View maps. A negative value simply gets a small bubble, and positive value gets a larger bubble. You have to hover over the bubbles to see each value and understand the bubble size that represents the change in sign from positive to negative. This gets a little tedious and minimizes the usefulness of the bubble map in the first place. The example Power View map below shows the year over year change in trespassing incidents in my city from 2013 to 2014. The small bubble has a value of -16, and the large bubble to the right has a value of 6.

PV Map Neg Values

One way to better present this type of data in a Power View map is to have positive values represented in one color and negative values represented in another with the bubble sizes showing the absolute value of the change. This proved to be less than straight forward as I worked my way through it. You can’t just make separate measures for the positive and negative values because Power View won’t let you put two measures on a map in that manner. You can’t simply create a calculated measure that determines the sign of your values and put it in the legend because Power View is expecting categorical data for the legend and it won’t let you put the numerical data field in the legend. Power View also doesn’t recognize calculated measures that return a text value (they don’t show up in the field list), so you can’t create a measure that turns the sign into a text value (“Positive” or “Negative”) in an effort to make it more categorical.

But you can use dynamic segmentation to show the second measure as a dimension. There is a great video on MSBI Academy that walks you through this. Although the video shows two different measures on a map, you can use this method to color code positive and negative values of a single measure. First, create a disconnected table for the number sign attributes. It should contain at least:

  • a text field with a descriptor
  • a field for the minimum value of the range
  • a field for the maximum value of the range.

The video shows storing that table in Excel (which is great if you plan to make changes), but you can now also paste the table directly into Power Pivot without any link to the table in Excel. My final table looks like the image below.

PV Map Sign Table

Next you need a calculation that uses the disconnected table and displays the absolute value of the measure you want to use for the bubble size.

Change In Incidents YOY Pos/Neg:=
     if(CALCULATE(Countrows('Sign'),
     FILTER('Sign',[Change in Incidents YOY] >= Sign[Min] 
     && [Change in Incidents YOY] < Sign[Max])
     ),
     abs([Change in Incidents YOY])
     )

Then you can build your map.  Use the Pos/Neg field for color, the new calculation for size, and your location fields of choice.

PV Map Pos Neg 1 Layer

Note that you will now see a warning message in the Power View fields list that relationships may be needed. You can disregard that. The use of a disconnected table is the cause of that message, and everything is working as intended.

PV Rel Msg

If your map has no drill down levels, you can stop here. Your map will now have the values for positive and negative in the legend with bubbles of two colors on the map. A bubble whose signed value is 5 will be the same size as a bubble whose signed value is -5 with the color indicating the sign of the number. In this analysis, a positive number indicates an increase in trespassing, so I chose a theme and ordered my values in the sign table such that the positive values were assigned the brighter color. These are the bubbles that need attention since trespassing is rising in the related zip codes.

If you do have drill down levels in your map, you have a couple more steps. You need another measure that deals with the drill down path. Without this extra step you end up with pies and unexpected values.

PV Map Pos Neg 2 Layer Bad

My map allows the user to drill down from zip code to beat. My new calculation references the top level of my drill down path ([Zip Code]), the calculation I just created ([Change In Incidents YOY Pos/Neg]), and the second level of the drill down path in my map ([Beat]). Note that this calculation is specific to the drill down path used in the map. If you have several maps with different drill down paths, you will need to create several calculations for this to work.

Change In Incidents YOY Pt 2:=
     IF(COUNTROWS(ALLSELECTED('KCPD Crime Data 2014'[Zip Code]))=1, 
     [Change In Incidents YOY Pos/Neg], 
     CALCULATE([Change In Incidents YOY Pos/Neg], 
     ALLSELECTED('KCPD Crime Data 2014'[Beat]))
     )

Now I can put this new measure in place of the original one, and the pies will go away, returning the expected color coded bubbles and values.

PV Map Pos Neg 2 Layers SBS

Data Visualization, DAX, Excel, Microsoft Technologies, Power Pivot

Power Pivot: Dynamically Identifying Outliers with DAX

I have been working on a project in which we were looking at durations as an indicator of service levels and customer satisfaction, specifically the maximum duration and average duration. We built a small dimensional data mart in SQL Server to house our data, then pulled it in to Excel using Power Pivot so we could make a few Excel and Power View reports and provide the client with the model for further ad hoc reporting and analysis. After the first iteration of development, we got feedback from the client that there were several outliers that were skewing the metrics. They wanted to see average and max duration for whatever group of objects they had chosen and then see the calculations after removing the top and bottom 5% of values.

Here is a small sample of my fact table, called Object Processing.  Field names and some small details have been changed to protect the innocent.

ObjectKey StatusKey CreationDateKey GeographyKey Duration
1 1 20140501 1 2193218
2 1 20140503 2 2114354
3 2 20140429 3

The first four columns are foreign keys to dimensions. The object dimension contains an object ID, name, owner, and content group. The status dimension contains an ID and description of the current processing status. The date dimension is a standard date dimension representing the date the object was created. The geography dimension contains an ID, country, and region. The duration column represents the number of seconds the object has spent processing. A new row is inserted into the fact table for each object created.

First, I have a few simple DAX calculations. These calculations work great on their own until you need to remove the outliers:

Total duration:= Sum('Object Processing'[Duration])
Avg duration:= Average('Object Processing'[Duration])
Max duration:= Max('Object Processing'[Duration])

To remove the outliers we need to rank the objects by duration. I first read Alberto Ferrari’s blog post on dynamic ranking in Power Pivot. It demonstrates a method to rank items within the table as a calculated column, which didn’t quite work for me because I wanted to dynamically rank items after filters were applied (ex: rank only the objects with a create date of yesterday and a status of completed). Then I found a series of posts on PowerPivotPro.com about Dynamic TopN Reports Via Slicers (parts 1, 2, 3, 4). That was what I needed to work through my similar problem.

First, I created a calculated measure for ranking the objects. If an object had no duration (because it had been created but had not yet reported a duration), I wanted it to be blank rather than rank as the lowest duration. In this formula, I’m ranking Object IDs in descending order based upon my total duration calculation.

Object Duration Rank:=
           if (ISBLANK([Total Duration]), BLANK(),
           RANKX(ALLSELECTED(Object[Object ID]), [Total Duration], ,0))

Next, I needed to identify the top and bottom 5%. First, I counted the total number of objects excluding null durations:

Object Count For Outliers:=
               CALCULATE(DISTINCTCOUNT('Object Processing'[ObjectKey]),
               ALLSELECTED('Object Processing'),
               NOT(ISBLANK('Object Processing'[Duration])))

Then I calculated the number of objects in the top 5%, giving me the rank of the last member of my top outliers group:

Outlier Count:=round([Object Count For Outliers]*.05,0)

To get the rank of the first member of the bottom outliers group I calculated:

Low Outlier Rank:=[Object Count For Outliers]-[Outlier Count]

Finally, I wrote a calculation that determines if the object is in range (not an outlier):

Is In Range:=
               if(ISBLANK([Total Duration]), BLANK(),
               if(Object Duration Rank]>[Outlier Count] && 
               [Object Duration Rank]<[Low Outlier Rank], 1, 0))

Now I could filter my pivot table containing the Is In Range calculation for only those values equal to 1 to see the average and max duration with outliers removed.

Ultimately, I produced a summary report similar to this:Object Duration SummaryWe were able to compare object durations for objects created in the last 1, 7, and  30 days, including and excluding outliers. The conditional highlighting is set so that cells are highlighted if the duration for that time period is greater than the duration for the prior time period listed below it. For example, average object duration for yesterday is 5:06:02, which is greater than the average duration for the last 7 days of 4:54:14.

I’m still working on a DAX calculated measure that will return the average object duration excluding outliers in one fell swoop, rather than needing to filter out the outliers in a pivot table.