Month: February 2016

Trekking through the DAX Jungle In Search of Lost Customers

I like to think I’m proficient at writing DAX and building SSAS tabular models. I enjoy a good challenge and appreciate requirements that cause me to stretch and learn. But sometimes I hit a point where I realize I must go for help because I’m not going to complete this challenge in a timely manner on my own.  I think this is part of how most IT professionals operate.

This, but replace adultier adult with DAX-ier (more knowledgeable?) BI Consultant

Luckily, I am part of a great SQL Server community, and I also work with some really smart people.

I was testing some calculations in my tabular model when I noticed that my Lost Customers calculation wasn’t working as expected. This was rather interesting to me since the calculation I was using was from DAXPatterns.com.  After some experimentation, I determined that the difference between my data situation, and the situation that fit the DAX Pattern was that my customer dimension was a Type 2 Slowly Changing Dimension. That meant I couldn’t use the customer key to identify a unique customer, since each customer could have multiple customer keys  (surrogate keys) with the same customer number (business key). Specifically, if a customer made a purchase in December 2015, then changed their name, then made a purchase in January and February, my calculation was counting them as lost because it was not recognizing that the customer with the new name and same customer number was actually the same customer.

My original calculation that didn't properly handle Type 2 SCDs:
 Lost Customers :=
 IF (
     NOT (
         MIN ( 'Date'[Full Date] )
             CALCULATE ( MAX ( Sales[Invoice Date] )ALL ( Sales ) )
     ),
     COUNTROWS (
         FILTER (
             ADDCOLUMNS (
                 FILTER (
                     CALCULATETABLE (
                         ADDCOLUMNS (
                             VALUES ( Sales[CustomerNo] ),
                             "CustomerLostDate"                             CALCULATE ( MAX ( Sales[Invoice Date] ) )
                             + [Lost Days Limit]
                         ),
                         FILTER (
                             ALL ( 'Date' ),
                             AND (
                                 'Date'[Full Date] < MIN ( 'Date'[Full Date] ),
                                 'Date'[Full Date]
                                     >= MIN ( 'Date'[Full Date] ) 
                                         - [Lost Days Limit]
                             )
                         )
                     ),
                    AND (
                         AND (
                             [CustomerLostDate] >= MIN ( 'Date'[Full Date] ),
                             [CustomerLostDate] <= MAX ( 'Date'[Full Date] )
                         ),
                      [CustomerLostDate] <= 
                       CALCULATE (MAX ( Sales[Invoice Date] )ALL ( Sales ) )
                     )
                 ),
                 "FirstBuyInPeriod"CALCULATE ( MIN ( Sales[Invoice Date] ) )
             ),
             OR ( 
                 ISBLANK ( [FirstBuyInPeriod] ), 
                 [FirstBuyInPeriod] > [CustomerLostDate] 
              )
         )
     )
 )

Having put forth a good effort on my own and not wanting to go down the rabbit hole for several days, I decided to post my issue to the Power BI forums where Marco Russo (one of the authors of DAXPatterns.com) was able to help me.
Based upon his suggestions, my final calculation ended up as shown below. Only the bolded part (the innermost ADDCOLUMNS function) was changed

Lost Customers :=
 IF (
     NOT (
         MIN ( 'Date'[Full Date] )
             CALCULATE ( MAX ( Sales[Invoice Date] )ALL ( Sales ) )
     ),
     COUNTROWS (
         FILTER (
             ADDCOLUMNS (
                 FILTER (
                     CALCULATETABLE (
                         ADDCOLUMNS (
                              CALCULATETABLE ( 
                                 VALUES ( Customer[Customer No] ), Sales ),
                                 "CustomerLostDate"CALCULATE (
                                     MAX ( Sales[Invoice Date] ),
                                     ALLEXCEPT ( Customer, Customer[Customer No] )
                              )
                                  + [Lost Days Limit]
                          ),
                         FILTER (
                             ALL ( 'Date' ),
                             AND (
                                 'Date'[Full Date] < MIN ( 'Date'[Full Date] ),
                                 'Date'[Full Date] >= 
                                     MIN ( 'Date'[Full Date] ) - [Lost Days Limit]
                             )
                         )
                     ),
                     AND (
                         AND (
                             [CustomerLostDate] >= MIN ( 'Date'[Full Date] ),
                             [CustomerLostDate] <= MAX ( 'Date'[Full Date] )
                         ),
                         [CustomerLostDate] <=  
                            CALCULATE ( MAX ( Sales[Invoice Date] )ALL ( Sales )                            )
                     )
                 ),
                 "FirstBuyInPeriod"CALCULATE ( MIN ( Sales[Invoice Date] ) )
             ),
             OR ( 
                ISBLANK ( [FirstBuyInPeriod] ), 
                [FirstBuyInPeriod] > [CustomerLostDate] 
             )
         )
     )
 )

In more human-friendly terms, I think this calculation does the following:

  • Create a table of customer numbers that have sales, along with their customer lost date, which is defined as the max invoice date for that customer number after removing all context filters on the customer table except for customer number + the number returned by Lost Days Limit (in my case, 60)
  • Filter the dates on the calculated table such that the [min invoice date minus 60] is less than the selected date which is less than and the minimum invoice date
  • Filter the dates on the calculated table such that the customer lost date is between the minimum invoice date and the maximum invoice date selected and is less than the max invoice date for all sales.
  • Add a column called FirstBuyInPeriod that contains the min invoice date for the selected dates
  • Filter the calculated table such that the minimum invoice date in the selected dates is blank or is greater than the customer lost date

This worked fine and gave me the answer I was looking for, but was (as noted on the DAX Patterns page) rather computing-intensive.

My coworker, Javier Guillen offered a great alternative to Marco’s pattern.

First Date Selected :=
 FIRSTDATE ( DATEADD ( 'Date'[Full Date], 0MONTH ) )
Last Date Selected :=
 LASTDATE ( DATEADD ( 'Date'[Full Date], 0MONTH ) )
Customer Lost Date :=
 CALCULATE (
     MAX ( Sales[Invoice Date] ),
     CALCULATETABLE (
         Sales,
         FILTER (
             ALL ( Customer ),
             Customer[Customer No] = MAX ( Customer[Customer No] )
         ),
         FILTER ( ALL ( 'Date' ), 'Date'[Full Date] <= MAX ( 'Date'[Full Date] ) )
     )
 )
     + [Lost Days Limit]
Lost Customer :=
 SUMX (
     VALUES ( Customer[Customer No] ),
     IF (
         [Customer Lost Date] >= [First Date Selected]
             && [Customer Lost Date] <= [Last Date Selected],
         1
     )
 )

These calculations do the following:

  • First Date Selected: Calculate the minimum date of the dates selected
  • Last Date Selected: Calculate the maximum date of the dates selected
  • Customer Lost Date: Create a table based upon the Sales table that returns the max invoice date + [Lost Days Limit]  with a filter on customer to get the max customer number for each customer record (this is how we alleviate the type 2 SCD issue) and a filter on dates to get dates less than the max selected date
  • Lost Customer: For each customer number, count 1 if the customer lost date is between First Date Selected and Last Date Selected, otherwise count 0. Return the sum.

Javier’s suggestion is less resource intensive and returns results more quickly. But it requires that the date range you choose be contiguous. This means that I can’t create a pivot table and choose Jan and February of 2015 and January and February of 2016 and expect to get a correct answer. That is kind of an edge case situation that might not even be an issue, but I already had a situation for my project where YOY analysis of customer counts was needed, so I opted to go with Marco’s suggestion for now. If you don’t have this use case, I think Javier’s calculations are much easier to understand and implement. You could probably put some conditions around it to blank out if the dates are not contiguous and it would be ready to go.

It was cool to find something that wasn’t already covered by a common DAX pattern and wasn’t frequently blogged about, and I’m grateful for the insight from the solutions offered by both Marco and Javier.