Trekking through the DAX Jungle In Search of Lost Customers

I like to think I’m proficient at writing DAX and building SSAS tabular models. I enjoy a good challenge and appreciate requirements that cause me to stretch and learn. But sometimes I hit a point where I realize I must go for help because I’m not going to complete this challenge in a timely manner on my own.  I think this is part of how most IT professionals operate.

This, but replace adultier adult with DAX-ier (more knowledgeable?) BI Consultant

Luckily, I am part of a great SQL Server community, and I also work with some really smart people.

I was testing some calculations in my tabular model when I noticed that my Lost Customers calculation wasn’t working as expected. This was rather interesting to me since the calculation I was using was from DAXPatterns.com.  After some experimentation, I determined that the difference between my data situation, and the situation that fit the DAX Pattern was that my customer dimension was a Type 2 Slowly Changing Dimension. That meant I couldn’t use the customer key to identify a unique customer, since each customer could have multiple customer keys  (surrogate keys) with the same customer number (business key). Specifically, if a customer made a purchase in December 2015, then changed their name, then made a purchase in January and February, my calculation was counting them as lost because it was not recognizing that the customer with the new name and same customer number was actually the same customer.

My original calculation that didn't properly handle Type 2 SCDs:
 Lost Customers :=
 IF (
     NOT (
         MIN ( 'Date'[Full Date] )
             CALCULATE ( MAX ( Sales[Invoice Date] )ALL ( Sales ) )
     ),
     COUNTROWS (
         FILTER (
             ADDCOLUMNS (
                 FILTER (
                     CALCULATETABLE (
                         ADDCOLUMNS (
                             VALUES ( Sales[CustomerNo] ),
                             "CustomerLostDate"                             CALCULATE ( MAX ( Sales[Invoice Date] ) )
                             + [Lost Days Limit]
                         ),
                         FILTER (
                             ALL ( 'Date' ),
                             AND (
                                 'Date'[Full Date] < MIN ( 'Date'[Full Date] ),
                                 'Date'[Full Date]
                                     >= MIN ( 'Date'[Full Date] ) 
                                         - [Lost Days Limit]
                             )
                         )
                     ),
                    AND (
                         AND (
                             [CustomerLostDate] >= MIN ( 'Date'[Full Date] ),
                             [CustomerLostDate] <= MAX ( 'Date'[Full Date] )
                         ),
                      [CustomerLostDate] <= 
                       CALCULATE (MAX ( Sales[Invoice Date] )ALL ( Sales ) )
                     )
                 ),
                 "FirstBuyInPeriod"CALCULATE ( MIN ( Sales[Invoice Date] ) )
             ),
             OR ( 
                 ISBLANK ( [FirstBuyInPeriod] ), 
                 [FirstBuyInPeriod] > [CustomerLostDate] 
              )
         )
     )
 )

Having put forth a good effort on my own and not wanting to go down the rabbit hole for several days, I decided to post my issue to the Power BI forums where Marco Russo (one of the authors of DAXPatterns.com) was able to help me.
Based upon his suggestions, my final calculation ended up as shown below. Only the bolded part (the innermost ADDCOLUMNS function) was changed

Lost Customers :=
 IF (
     NOT (
         MIN ( 'Date'[Full Date] )
             CALCULATE ( MAX ( Sales[Invoice Date] )ALL ( Sales ) )
     ),
     COUNTROWS (
         FILTER (
             ADDCOLUMNS (
                 FILTER (
                     CALCULATETABLE (
                         ADDCOLUMNS (
                              CALCULATETABLE ( 
                                 VALUES ( Customer[Customer No] ), Sales ),
                                 "CustomerLostDate"CALCULATE (
                                     MAX ( Sales[Invoice Date] ),
                                     ALLEXCEPT ( Customer, Customer[Customer No] )
                              )
                                  + [Lost Days Limit]
                          ),
                         FILTER (
                             ALL ( 'Date' ),
                             AND (
                                 'Date'[Full Date] < MIN ( 'Date'[Full Date] ),
                                 'Date'[Full Date] >= 
                                     MIN ( 'Date'[Full Date] ) - [Lost Days Limit]
                             )
                         )
                     ),
                     AND (
                         AND (
                             [CustomerLostDate] >= MIN ( 'Date'[Full Date] ),
                             [CustomerLostDate] <= MAX ( 'Date'[Full Date] )
                         ),
                         [CustomerLostDate] <=  
                            CALCULATE ( MAX ( Sales[Invoice Date] )ALL ( Sales )                            )
                     )
                 ),
                 "FirstBuyInPeriod"CALCULATE ( MIN ( Sales[Invoice Date] ) )
             ),
             OR ( 
                ISBLANK ( [FirstBuyInPeriod] ), 
                [FirstBuyInPeriod] > [CustomerLostDate] 
             )
         )
     )
 )

In more human-friendly terms, I think this calculation does the following:

  • Create a table of customer numbers that have sales, along with their customer lost date, which is defined as the max invoice date for that customer number after removing all context filters on the customer table except for customer number + the number returned by Lost Days Limit (in my case, 60)
  • Filter the dates on the calculated table such that the [min invoice date minus 60] is less than the selected date which is less than and the minimum invoice date
  • Filter the dates on the calculated table such that the customer lost date is between the minimum invoice date and the maximum invoice date selected and is less than the max invoice date for all sales.
  • Add a column called FirstBuyInPeriod that contains the min invoice date for the selected dates
  • Filter the calculated table such that the minimum invoice date in the selected dates is blank or is greater than the customer lost date

This worked fine and gave me the answer I was looking for, but was (as noted on the DAX Patterns page) rather computing-intensive.

My coworker, Javier Guillen offered a great alternative to Marco’s pattern.

First Date Selected :=
 FIRSTDATE ( DATEADD ( 'Date'[Full Date], 0MONTH ) )
Last Date Selected :=
 LASTDATE ( DATEADD ( 'Date'[Full Date], 0MONTH ) )
Customer Lost Date :=
 CALCULATE (
     MAX ( Sales[Invoice Date] ),
     CALCULATETABLE (
         Sales,
         FILTER (
             ALL ( Customer ),
             Customer[Customer No] = MAX ( Customer[Customer No] )
         ),
         FILTER ( ALL ( 'Date' ), 'Date'[Full Date] <= MAX ( 'Date'[Full Date] ) )
     )
 )
     + [Lost Days Limit]
Lost Customer :=
 SUMX (
     VALUES ( Customer[Customer No] ),
     IF (
         [Customer Lost Date] >= [First Date Selected]
             && [Customer Lost Date] <= [Last Date Selected],
         1
     )
 )

These calculations do the following:

  • First Date Selected: Calculate the minimum date of the dates selected
  • Last Date Selected: Calculate the maximum date of the dates selected
  • Customer Lost Date: Create a table based upon the Sales table that returns the max invoice date + [Lost Days Limit]  with a filter on customer to get the max customer number for each customer record (this is how we alleviate the type 2 SCD issue) and a filter on dates to get dates less than the max selected date
  • Lost Customer: For each customer number, count 1 if the customer lost date is between First Date Selected and Last Date Selected, otherwise count 0. Return the sum.

Javier’s suggestion is less resource intensive and returns results more quickly. But it requires that the date range you choose be contiguous. This means that I can’t create a pivot table and choose Jan and February of 2015 and January and February of 2016 and expect to get a correct answer. That is kind of an edge case situation that might not even be an issue, but I already had a situation for my project where YOY analysis of customer counts was needed, so I opted to go with Marco’s suggestion for now. If you don’t have this use case, I think Javier’s calculations are much easier to understand and implement. You could probably put some conditions around it to blank out if the dates are not contiguous and it would be ready to go.

It was cool to find something that wasn’t already covered by a common DAX pattern and wasn’t frequently blogged about, and I’m grateful for the insight from the solutions offered by both Marco and Javier.

4 thoughts on “Trekking through the DAX Jungle In Search of Lost Customers

  1. This is great work! I tried implementing something similar to this on a data set that has over 10 million records, and it’s been “still working” for over an hour and still isn’t done. Any suggestions? Is the data set just too big to do it? I have a metric that is the inventory level for each month. If I have multiple months, it will sum up the inventories of the selected months, which isn’t accurate. If multiple months are selected, I’d like to return only the inventory level of the most recent month. I did that with a calculated column: SelectedPeriodInventory = SUMX(Values(Dataset[Inventory]),IF(YearMonth = [Last Year Month Selected]),1)). [Last Year Month Selected] is also a calculated column.

    I also have one other question — is there any way to get a dimension “under” the metrics? For example, the way it currently works right now, if I have four metrics, and I have the year dimension as a column, it will group the metrics by each year, so I have 2011 Inventory, 2011 Cost Sold, 2011 Gross Margin, 2012 Inventory, 2012 Cost Sold, etc. Is there any way (in Power BI) that I can change that so each metric is grouped with itself? For example, 2011 Inventory, 2012 Inventory, 2013 Inventory, 2011 Cost Sold, 2012 Cost Sold, 2013 Cost Sold, etc.

    Thank you!

  2. I figured out a solution to the first part, but not the second one.

    For the first part, I have a measure that holds the max year and month selected: YEAR(CALCULATE(MAX(ItemTotals[Date]),ALLSELECTED(ItemTotals))) & FORMAT(MONTH(CALCULATE(MAX(ItemTotals[Date]),ALLSELECTED(ItemTotals))),”00″)

    Then I have another measure that uses that one:
    EndingInventory = IF(ItemTotals[YearMonth] = ItemTotals[Last YearMonth Selected],CALCULATE(SUM(ItemTotals[Inventory]),FILTER(ItemTotals,[YearMonth] = [Last YearMonth Selected])))

    I am still stuck on getting a dimension under the metrics.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s