Microsoft Technologies, Power BI, Power Query

Unpivot a matrix with multiple fields on columns in Power Query

I had to do this for a client the other day, and I realized I hadn’t blogged about it. Let’s say you need to include data in a Power BI model, but the only source of the data is a matrix that is output from another system. And that matrix has multiple fields populating the columns. An example of this is below. The matrix has fiscal year and product category on columns, vertical on rows, and the profit metric populating the values.

A matrix in Excel with Fiscal Year and Product Category on Columns and Vertical on Rows with Profit shown in the Measures.

You may know about the wonderful unpivot functionality in Power Query, but to handle this matrix, you first need to Transpose.

The steps to turn this matrix into a table are:

  1. Transpose the query.
  2. Remove the last column that contains the vertical totals.
  3. Filter out the “Totals” value in the second column, which contains the product categories.
  4. Use the first row as headers.
  5. Select the Fiscal Year, Product Category, and Metric columns. Select Unpivot Other Columns.
  6. Rename the Attribute column to Verticals.

Transposing a table changes categories into rows.

A query with fiscal year and product category on columns and vertical on rows is transposed. It becomes a query with fiscal year in the first column, product category in the second column, and the verticals become columns instead of rows.
Transposing a query in Power Query

We don’t need the totals columns or rows because Power BI can recalculate those, and we don’t want to double-count profits.

In order to unpivot, we need to promote the first row to column headers, so the first column is labeled Fiscal Year and the fourth column is labeled Vertical Z.

The first three columns are as they should be, but the verticals need to change from columns to rows. This is accomplished by unpivoting. But you only want to unpivot the verticals and leave the fiscal year, product category, and metric columns as they are.

Then make sure column names are user-friendly, and it’s finished. You may also opt to remove the Metric column and rename the value column, if there is only one metric.

A query with 4 columns: fiscal year, product category, verticals, profit
The finished product after transposing and unpivoting
Microsoft Technologies, Power BI, Power Query

Generating Unicode Characters in Power Query

You may have used the UNICHAR() function in DAX to return Unicode characters in DAX measures. If you haven’t yet read Chris Webb’s blog post on the topic, I recommend you do. But did you know there is a Power Query function that can return Unicode characters? This can be useful in cases when you want to assign a Unicode character to a categorical value.

In Power Query, you can use Character.FromNumber to return a Unicode character based upon the specified number.

I recently used this function in a Workout Wednesday for Power BI exercise to visualize music notes. I made a scatterplot that used eighth notes and eighth rests as the markers.

To create an eighth note (𝅘𝅥𝅮), I added Character.FromNumber(9834) to my Power Query expression. The eighth rest is Character.FromNumber(119102). There are many music-related Unicode characters available.

With the Character.FromNumber function, we can make a custom column that uses an If statement to determine the correct character to use for each row in a table.

One thing to note is that the query preview in the Power Query Editor doesn’t always render the Unicode values correctly. This doesn’t mean it won’t work in your report, though.

For instance, Unicode character 119102 doesn’t look like the eighth rest when viewed in the Power Query preview.

The question mark in a box represents a unicode character that could not be properly rendered

But after applying changes and sending the data into the dataset, you’ll see your expected output in the Data view in Power BI Desktop.

Output of the custom column containing Unicode characters in the Data view in Power BI Desktop

The Unicode character does show correctly in Power Query Online when creating a dataflow.

While DAX is the answer for generating music notes or emojis based upon aggregated values, Power Query can help you generate a Unicode value per row in a table by using the Character.FromNumber function.

Microsoft Technologies, Power BI, Power Query

Calling the Intercom API with Power Query and Refreshing in the Power BI Service

I needed to pull some user data for an app that uses Intercom. While I will probably import the data using Data Factory or a function in the long term, I needed to pull some quick data in a refreshable manner to combine with other data already available in Power BI.

I faced two challenges in getting this code to work:

  1. Intercom’s API uses cursor-based pagination when retrieving contacts
  2. I needed this query to be refreshable in PowerBI.com so I could schedule a daily refresh.

If you have worked with the Web.Contents function in Power Query, you may be familiar with all the various ways you can use it that aren’t supported in a refresh on PowerBI.com. Chris Webb’s blog is a great source for this info, if you find yourself stuck with a query that works in Power BI Desktop but not in the service.

The Intercom API

In version 2.4 of the Intercom API, users are a type of contact. You must make an HTTP GET call to https://api.intercom.io/contacts and pass an access token and Accept:application/json in the headers.

In the query string, you can specify the number of contacts per page by specifying per_page=x where x is a number less than or equal to 150. If you have more than 150 contacts, you will need to handle the cursor-based pagination.

When you make the initial call, the API will return a JSON object that contains a total count of contacts and a record for pages. Expanding the pages record shows the current page, the number of contacts per page, and the total number of pages.

type: pages
next: Record
page: 1
per_page: 150
total_pages: 3
Data returned in the Power Query Editor when retrieving contacts from the Intercom API

Expanding the next record gives you page 2 with a starting_after ID.

The cursor used for pagination in the Intercom API as retrieved using Power Query

To get the next page of contacts, the API call would be https://api.intercom.io/contacts?per_page=150&starting_after=[the starting_after ID listed above].

The Power Query Queries

This blog post from Gil Raviv gave me some ideas where to start, but the code in that blog will not refresh in PowerBI.com. You cannot put the iterations and the Web.Contents call and the generated list all in one query if you want to use scheduled refresh.

I ended up creating one query and two functions to accomplish my goal. The second function is optional, but you may find it useful as the time values in the API response are listed as Unix timestamps and you probably want to convert them to datetime values.

The first function contains the API call with an input parameter for the starting_after ID.

//Web API call function
(startafter) as record =>
   let
   Source = Json.Document(Web.Contents("https://api.intercom.io/contacts",[Query=[per_page="150",starting_after=startafter],Headers=[Accept="application/json", Authorization="Bearer <your access token goes here>"]])),
   data = try Source[data] otherwise null,
    pages = Source[pages],
    ttlpages = pages[total_pages],
    nextkey = pages[next][starting_after],
    next = try nextkey otherwise null,
    res = [Data=data, Next=next,TotalPages = total_pages]
   in
    res

It’s important to make the url in the Web.Contents function be a static string. If it is concatenated or dynamic in any way, the query will not be able to refresh in the Power BI service. All the query string parameters can go in the Query arguments of the Web.Contents function. If you have multiple query string arguments, you can put them in brackets with a comma separating them. You can do the same with multiple headers.

This function attempts the API call and returns null if it encounters an error. Once the data is returned, it retrieves the specific JSON objects needed to return the data. The next two lines are used to retrieve the starting_after value. The function returns the contact data, starting_after value, and total_pages value.

I used the following query to call that function.

let
GeneratedList = List.Generate(
   ()=>[i=0, res = fnGetOnePage("")],
   each [res][Data]<>null,
   each [i=[i]+1, res = fnGetOnePage([res][Next])],
   each [res][Data]),
    #"Converted to Table" = Table.FromList(GeneratedList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1"),
    #"Expanded Column2" = Table.ExpandRecordColumn(#"Expanded Column1", "Column1", {"id", "workspace_id", "external_id", "role", "email", "name", "has_hard_bounced", "marked_email_as_spam", "unsubscribed_from_emails", "created_at", "updated_at", "signed_up_at", "last_seen_at", "last_replied_at", "last_contacted_at", "last_email_opened_at", "last_email_clicked_at", "browser", "browser_version", "browser_language", "os", "location", "custom_attributes", "tags", "notes", "companies", "opted_out_subscription_types"}, {"id", "workspace_id", "external_id", "role", "email", "name", "has_hard_bounced", "marked_email_as_spam", "unsubscribed_from_emails", "created_at", "updated_at", "signed_up_at", "last_seen_at", "last_replied_at", "last_contacted_at", "last_email_opened_at", "last_email_clicked_at", "browser", "browser_version", "browser_language", "os", "location", "custom_attributes", "tags", "notes", "companies", "opted_out_subscription_types"}),
    #"Expanded location" = Table.ExpandRecordColumn(#"Expanded Column2", "location", {"type", "country", "region", "city", "country_code"}, {"location.type", "location.country", "location.region", "location.city", "location.country_code"}),
    #"Expanded custom_attributes" = Table.ExpandRecordColumn(#"Expanded location", "custom_attributes", {"role", "brand", "Subdomain"}, {"custom_attributes.role", "custom_attributes.brand", "custom_attributes.Subdomain"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded custom_attributes",{"opted_out_subscription_types", "tags", "notes", "companies", "role"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"id", type text}, {"workspace_id", type text}, {"external_id", type text}, {"email", type text}, {"name", type text}, {"has_hard_bounced", type logical}, {"marked_email_as_spam", type logical}, {"unsubscribed_from_emails", type logical}, {"created_at", Int64.Type}, {"updated_at", Int64.Type}, {"signed_up_at", Int64.Type}, {"last_seen_at", Int64.Type}, {"last_replied_at", Int64.Type}, {"last_contacted_at", Int64.Type}, {"last_email_opened_at", Int64.Type}, {"last_email_clicked_at", Int64.Type}}),
    #"Replace Null with 0" = Table.ReplaceValue(#"Changed Type",null,0,Replacer.ReplaceValue,{"created_at","updated_at","signed_up_at","last_replied_at","last_seen_at","last_contacted_at","last_email_opened_at","last_email_clicked_at"
}),
    #"Invoked Custom Function" = Table.AddColumn(#"Replace Null with 0", "created_at_dt", each fnUnixToDateTime([created_at])),
    #"Invoked Custom Function1" = Table.AddColumn(#"Invoked Custom Function", "updated_at_dt", each fnUnixToDateTime([updated_at])),
    #"Invoked Custom Function2" = Table.AddColumn(#"Invoked Custom Function1", "signed_up_at_dt", each fnUnixToDateTime([signed_up_at])),
    #"Invoked Custom Function3" = Table.AddColumn(#"Invoked Custom Function2", "last_seen_at_dt", each fnUnixToDateTime([last_seen_at])),
    #"Invoked Custom Function4" = Table.AddColumn(#"Invoked Custom Function3", "last_replied_at_dt", each fnUnixToDateTime([last_replied_at])),
    #"Invoked Custom Function5" = Table.AddColumn(#"Invoked Custom Function4", "last_contacted_at_dt", each fnUnixToDateTime([last_contacted_at])),
    #"Invoked Custom Function6" = Table.AddColumn(#"Invoked Custom Function5", "last_email_opened_at_dt", each fnUnixToDateTime([last_email_opened_at])),
    #"Invoked Custom Function7" = Table.AddColumn(#"Invoked Custom Function6", "last_email_clicked_at_dt", each fnUnixToDateTime([last_email_clicked_at])),
    #"Fix Null Values" = Table.ReplaceValue(#"Invoked Custom Function7",DateTime.From("1970-01-01"),null,Replacer.ReplaceValue,{"created_at_dt","updated_at_dt","signed_up_at_dt","last_replied_at_dt","last_seen_at_dt","last_contacted_at_dt","last_email_opened_at_dt","last_email_clicked_at_dt"
}),
    #"Removed Columns1" = Table.RemoveColumns(#"Fix Null Values",{"created_at", "updated_at", "signed_up_at", "last_seen_at", "last_replied_at", "last_contacted_at", "last_email_opened_at", "last_email_clicked_at"})
in
    #"Removed Columns1"

The List.Generate call generates the rows I need to call my first function. It sets an iterator variable to 0 and then calls the function, which returns the first page of data along with the total pages and the starting_after ID. As long as data is returned, it makes the API call again with the previously returned starting_after ID. This creates a list of lists that can be converted into a table of records. Then the records can be expanded to fields.

I expanded several columns out into multiple columns. Then I adjusted the data types of my columns to the correct types (they came back as Any data type).

There were several columns that contained Unix timestamps. All of the custom function calls are returning the datetime version of those values. I needed to handle null values in the timestamp conversion, so I replaced all the null timestamps with 0, converted them, and then converted the datetime value of 1-Jan-1970 back to null. I did the replace for all 7 columns in one step. Then I removed the original columns that contained the Unix timestamp as they were not analytically relevant for me.

Below is my Unix timestamp to datetime conversion function.

(UnixTime as number) as datetime=> 
let 
DT = #datetime(1970, 1, 1, 0, 0, 0) + #duration(0, 0, 0, UnixTime) 
in DT

Advice and warnings

When using an API key that must be passed in the headers, it is safest to use a custom connector. Otherwise, you have to embed your API key in the M code, as shown above. When the query is sent to Intercom, it is encrypted using HTTPS. But anyone that opens your PBIX file would have access to it. Your key will be captured in the Power BI logs. And anyone that can manage to intercept your web request and decrypt your traffic would have access to it. This is not ideal. But creating a custom connector requires more advanced code and a gateway to make it usable in the Power BI service. With either option, you will choose Anonymous authentication for the data source.

Be sure to use the RelativePath and Query options in the Web.Contents call. This is necessary to make the query refreshable in the service. The value passed to the first parameter of Web.Contents must be a static string and must be valid in itself (no errors returned).

After publishing your report, you’ll need to set the credentials for the dataset before you can refresh it. Be sure to check the Skip test connection box. Otherwise, your credentials update will fail.

url: https://api.intercom.io/contacts
Authentication method: anonymous
Privacy level: organizational 
Skip test connection: yes
The Skip test connection option for the web data source in the dataset settings in PowerBI.com

Even though we are using anonymous authentication, you can still choose an Organizational privacy level.

In my contacts list, if I want only app users from your Intercom contacts list, I needed to filter the results on role = “user” since contacts also includes leads. The Role attribute was in the custom attributes returned by the API.

It took a bit of experimentation to get this working, so I thought I would share what I found to work. As always, Power BI is always changing and a better way to do this may be available in the future.

If you would like this development experience to improve, here are some Power BI Ideas to vote for:

DAX, Microsoft Technologies, Power BI, Power Query

Calculating Age in Power BI

In week 26 of Workout Wednesday for Power BI, I asked people to calculate the age of Nobel laureates at the time they received the award. I provided some logic, but I didn’t prescribe how to create the age calculation. This inspired a couple of questions and a round of data validation as calculating age may be trickier than you think. In this post, I’ll explore some of the ways people have calculated age in Power BI and the edge cases where those calculations may not work.

In my solution video for Workout Wednesday, I used Power Query to calculate age. This was inspired by several blog posts and videos I had seen previously. There is an Age menu option in the Power Query editor under Date.

Calculating Age with the Power Query Editor user interface

When you select a date column and use that Age option, it calculates the duration between the selected date and the current date in days. You must then replace the current date with the second date column. Next you can choose Total Years under Duration, which divides the days by 365. Finally, you must round that number down to the next integer to get years.

If you follow Ruth’s video, you can do all of that in one step that creates a custom column with the final age value.

 Number.RoundDown(Duration.TotalDays([Date2] - [Date1])/365) 

That is the most common option in Power Query as there is no DateDiff function.

There are a few options for calculating age in DAX. Some people use the DATEDIFF function.

Age DateDiff = DATEDIFF([Date1],[Date2],YEAR) 

Another way I have seen is to use YEARFRAC function.

Age YearFrac = INT ( YEARFRAC ( [Date1], [Date2], 1 ) )

The way Marco Russo suggests is to use QUOTIENT.

Age Quotient (DAX): 

Age Quotient = 

VAR Birthdate = [Date1]

VAR ThisDay = [Date2]

VAR IntBirthdate = YEAR ( Birthdate ) * 10000 + MONTH ( Birthdate ) * 100 + DAY ( Birthdate )

VAR IntThisDay = YEAR ( ThisDay ) * 10000 + MONTH ( ThisDay ) * 100 + DAY ( ThisDay )

VAR Age = QUOTIENT ( IntThisDay - IntBirthdate, 10000 )

VAR CheckedAge = DIVIDE ( Age, NOT ISBLANK ( Birthdate ) )

RETURN

    CheckedAge

As Marco points out, many people were using YEARFRAC, but there is a bug in the DAX implementation that causes it to occasionally return an incorrect answer for this purpose.

Checking the Numbers

I created a Power BI file to demonstrate the differences in these four calculations. You can download the file here. The image below displays the results in several tests. For each row, I’m using Date1 as the birthdate and Date2 as the “as of” date. You’ll notice that I focused on leap years for a few cases.

Table in Power BI with 10 date ranges showing the results from the four calculations. 6 of the 10 rows have different results across the calculations.
Example date ranges and result of the four age calculations

There are six of ten date ranges that have different results across the different calculation methods.

In the second row, the Power Query age calculation says that Feb 29 to Feb 28 in the following year is a full year. This may or may not be what you want depending on your requirements. I’m noting the difference so you can be aware. A similar thing occurs in the fifth row going from Feb 29, 2016 to Feb 28, 2020, and again on the 9th row going from March 1, 2019 to Feb 29, 2020.

On the third row, notice that the DAX DATEDIFF function calculates Feb 29 to Feb 27 of the following year to be a full year, despite it being a day or two short. Depending on what you do with leap years, you might consider Feb 29 to Feb 28 in the following year to be a full year, but that third row result means DATEDIFF is probably not the calculation I want. We see a similar result going from March 1 to Feb 28 of the following year.

YEARFRAC calculates that Feb 29 to Feb 28 in the following year is not a full year, which may be desirable. But it counts Feb 29, 2016 to Feb 29, 2020 as only three years. And we see that March 1, 2000 to March 1, 2021 is only counted as 20 years. So even without starting on a leap year, we get some incorrect results. Small numbers seem to be correct until it gets to about 13 years.

Using the QUOTIENT function provides what I consider to be the most correct results. It calculates Feb 29 to Feb 28 of the following year to be less than a year. It calculates Feb 29, 2016 to Feb 28, 2016 to be three years and not four. And it calculates March 1 to Feb 29 of the following year to be less than a year.

Which to use?

The QUOTIENT formula produces the most accurate results if you don’t want Feb 29 to Feb 28 the next year to be counted as a year. DATEDIFF and YEARFRAC produce too many incorrect results for me to ever suggest using them. Since there is a DAX option that produces more correct answers, I would just go for QUOTIENT instead of either of these two.

UPDATE: There is a better alternative! Imke Feldmann reminded me that there is an Number.IntegerDivide function in Power Query. So let’s take the logic from Marco’s DAX calculation and move it to Power Query:

(BirthDate as date, EndDate as date) =>
let
BirthDateInt = Date.Year(BirthDate)10000 + Date.Month(BirthDate)100 + Date.Day(BirthDate),
EndDateInt = Date.Year(EndDate)10000 + Date.Month(EndDate)100 + Date.Day(EndDate),
Age = Number.IntegerDivide((EndDateInt - BirthDateInt),10000)
in Age

The Power Query custom column created by invoking this function should produce better compression than a DAX calculated column. This might not be significant for a small dataset, but we should be efficient when we can.

Microsoft Technologies, Power BI, Power Pivot, Power Query, SSAS

Documenting your Tabular or Power BI Model

If you were used to documenting your SSAS model using the MDSchema rowsets, you might have noticed that some of them do not work with the new tabular models. For example, the MDSCHEMA_MEASUREGROUP_DIMENSIONS DMV seems to just return one row per table rather than a list of the relationships between tables.

Not to worry, though. With the new compatibility level 1200 SSAS Tabular models came some new DMVs. They are, unfortunately, not documented at this time (as of 3 Oct 2016). I expect them to show up here at some point in the future. Until then I published a Gist that provides a list of those views as well as some notes on what kind of useful info is in each one. While there are 36 TMSCHEMA DMVs, the main DMVs you will want to check out for model documentation purposes are:

  • TMSCHEMA_COLUMNS
  • TMSCHEMA_MEASURES
  • TMSCHEMA_MODEL
  • TMSCHEMA_RELATIONSHIPS
  • TMSCHEMA_TABLES

There are a few fields that show IDs rather than descriptions (e.g., ExplicitDataType in TMSCHEMA_COLUMNS, Type in TMSCHEMA_DATA_SOURCES). I’ve been able to figure out the descriptions that correspond to some of the values, but I’m still missing a few.

As with the MDSCHEMA DMVs, the TMSCHEMA DMVs can be used to document your model. This also works with Power BI models. You’ll just need to change the connection information to your Power BI instnce while the .pbix file is open in Power BI Desktop. For more information analyzing your Power BI model, see Chris Webb’s post. Since DMVs require DMX queries, you are somewhat limited in your ability to join the data from the DMVs together. This makes it necessary to store the data somewhere before transforming and merging them. Since I needed to document a tabular model for a client, I created a Power Pivot model in Excel 2016. You can download my documentation model here.  My example model was connected to a quick demo SSAS model I made based upon the Wide World Importers DW database.

My Power Pivot Model

My Power Pivot model uses Power Query to import the data from the DMVs. This enables me to merge data into a model that I feel makes the most sense and doesn’t suffer from too many relationships and hidden tables (which is what would happen if you just imported the results of each DMV query into the model). In order to make this work for any model, I employed a trick I learned from Chris Webb to avoid duplicating connection information across queries. I built my model with the normal embedded connection strings and then converted them to use the connection Connection Info tables. As I did that, I ran into a Power Query limitation as evidenced by the error message below.

“Formula.Firewall: Query ‘QueryName’ (step ‘StepName’) references other queries or steps and so may not directly access a data source. Please rebuild this data combination.”

Ken Puls has a great explanation of how to get around this issue. Basically, you have to redesign your queries so that you have “staging tables”, which you can then use to build finalized tables. This redesign gets you around the error since the finalized tables don’t have the embedded source connection, just a reference to the results from another query.

This Excel file documents:

  • Database and model info
  • Tables and source queries
  • Columns
  • Hierarchies
  • Measures
  • KPIs
  • Security roles, membership, and permissions
  • Relationships
  • Perspectives and the columns, hierarchies, and measures they contain

The Database Info tab uses cube functions to display the values shown. The rest of the tabs use pivot tables and slicers.

You can download and use this model if you have Excel 2016 or 2013 with Power Query. Just download the file, change the values in the TabularInstanceName and TabularDBName queries, and refresh. I was able to open the file and update the queries with no issues and no changes in Excel 2013.

This is great if you just need documentation or a quick way to look up specific information. But we could really analyze this using visualizations other than tables in Power BI Desktop.

My Power BI Model

I imported the Excel model into Power BI, made a few adjustments, and then added some visualizations. This report is a bit paired down from a visualization standpoint compared to the Excel file because I eliminated some of the data that wasn’t analytically relevant for me.

Click here to view this report in Power BI.

On the first page, I have several summary numbers that tell me about the contents of the model. It gives me a decent idea of the size and complexity of the model.

tabular-doc-summary

The second page uses a bar chart as an interactive filter against a list of table source queries and columns.

The third page shows relationships between the tables. Slicers on the left filter both the force directed graph and the table. Smart filters at the top of the page let you filter the Table and To Table in the relationships.

tabular-doc-relationshipsThe fourth page shows the details behind each measure. A column chart at the bottom provides a count of measures by table and whether the measure is hidden, which acts as another interactive filter.

The last page shows security by role and table, again using column charts as interactive filters.

Check out the Demo Day video I made on the BlueGranite blog for further explanation and a demo of the Tabular Model documentation.

Update 19 Sep 2017: I have turned the Power BI report into a Power BI Template for easier use. You can download it here

Microsoft Technologies, Power BI, Power Query

Using Power Query to Transform Website Data with Multiple Rows Per Entity

My colleague Hope Foley and I both enjoy a good craft beer. She has a great presentation on spatial data in SQL Server, which contains data about breweries. She mentioned she was gathering new data to add to her brewery database and showed me the brewery listing on the Brewery Collectibles Club of America website. This web page presents a challenge because of the way the data is laid out. Rather than having a table with one row per brewery, the web page uses one table per brewery with two rows in each table.
pqbeerwebsite
I typically use Microsoft Power Query to scrape data from websites for further analysis. Although this requires a bit of creativity (and some M), I knew Power Query was up to the task. And it gave me a chance to use my favorite Power Query function: Unpivot.

I opened up Excel and established my data source in Power Query. I chose From Web and entered the url: http://www.bcca.com/services/brewery_listing.asp. Once the data source loaded, I could see that there were over 3,000 tables within the page. pqbeertablemenu

Choosing table 0 showed me the columns headings for the overall table on the page.

PQbeertableheader

I could see the headings for the 6 fields, arranged in 2 rows as we see on the webpage.  I decided to figure out how to transform this table into the one row I needed and then figure out how to automate my process so I could reproduce the results for each subsequent table. I used the Power Query GUI to transform my table into 1 row with the following steps.

  1. On the Add Column tab, choose Add Index Column -> From 0.
  2. Select the Index column. On the Transform Tab, choose Unpivot Columns -> Unpivot Other Columns. This creates a table with 3 columns: Index, Attribute, and Value.
  3. Select the Index and Attribute columns. On the Home tab, choose Remove Columns -> Remove Columns.
  4. On the Transform tab, choose Transpose.

 

This creates the following M code, which you can see by clicking on Advanced Editor on the View tab.

let
 Source = Web.Page(Web.Contents("http://www.bcca.com/services/brewery_listing.asp")),
 Data0 = Source{0}[Data],
 #"Changed Type" = Table.TransformColumnTypes(Data0,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}}),
 #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
 #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),
 #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Index", "Attribute"}),
 #"Transposed Table" = Table.Transpose(#"Removed Columns")
in
 #"Transposed Table"

Next I used that query to create a function by changing just two lines.

(gettable) =>
let
 Source = Web.Page(Web.Contents("http://www.bcca.com/services/brewery_listing.asp")),
 Data0 = Source{(gettable)}[Data],
 #"Changed Type" = Table.TransformColumnTypes(Data0,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}}),
 #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
 #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),
 #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Index", "Attribute"}),
 #"Transposed Table" = Table.Transpose(#"Removed Columns")
in
 #"Transposed Table"

I can see the number of tables available in my data source, so I can create a query that invokes the function for the required number of tables, in my case 3744. As a final touch, I used the first row as headers.

let
Source = Table.Combine(List.Transform({0..3744}, Query1)),
#"First Row as Header" = Table.PromoteHeaders(Source)
in
#"First Row as Header"

Et voilà!

PQBeerFinalList

I now have a single table of data with one row per brewery.

Excel, Microsoft Technologies, Power BI, Power Pivot, Power Query

Moving Calculated Measures in Power Pivot for Excel 2013

I learned a lesson the hard way: I shouldn’t change field names and data types in Power Pivot on tables that were imported using Power Query. My changes broke the connection between the two tools, so when I refreshed a query in Power Query that was set to load the results to my data model it caused a new/duplicate table to be created in my data model instead of updating the original table.

PPBlogTables

I already had Power View reports built off of my model that I didn’t want to rebuild, and I didn’t want to leave a duplicate table in my model, but I needed to make the data model refreshable. The issue can be fixed, but it’s a bit tedious:

  1. Create relationships for the new table to match those of the old table
  2. Rename the old table to something else
  3. Rename the new table to the original table name (usually it has a 1 after the table name)
  4. Rebuild or move any calculated measures to the new table
  5. Delete the old table

I was fine until I got to #4. You can’t just copy a calculated measure to a new table by cutting and pasting. You can copy the formula from the formula bar and paste it into a cell on another table, but you will get a warning that you can’t have two measures with the same name.

PPBLogDupName

Going this route means pasting your measure, renaming the old measure, and then renaming the new copy of the measure to the original name.  Due to the lack of the ability to filter data in Power Map, I had to create several calculated measures in my data model to limit the data shown in my map.  The idea of copying, pasting, and renaming several dozen measures was less than appealing.

It should be noted that in the current version of Power Pivot, you can put calculated measures on any table in the model without affecting the result of a query using the measure. The table in which it is located is more of an organization issue rather than a technical calculation issue.

I thought there must be a better way to move a measure. I looked through all of the buttons on the ribbon in the Power Pivot window and saw nothing helpful so I turned to Google (yes, I still choose Google over Bing). A TechNet article provided just the guidance I needed:

“To move a measure, you can cut and paste the formula in the Calculation Area. You can only move it to a different location in the Calculation Area of the table in which it was created. You cannot move a measure to different table in this view; instead use the Measure Settings dialog in the PowerPivot ribbon in Excel to change the association.”

The Measure Settings dialog box held the answer, although I found these settings under the Calculated Fields -> Managed Calculated Fields.  PPBlogManageFields

It contains a drop-down box that allows me to choose the table where the definition of the measure should be stored. This was a much quicker way to quickly move several calculated measures to another table in Power Pivot.

PPBlogcalcFieldSet

I have built several Power Pivot models and never run into this particular situation. I can’t say I’m glad it happened, but I can appreciate that it reminded me that there are useful things on the Power Pivot ribbon in the Excel window that aren’t accessible through the Power Pivot window.

Data Visualization, Excel, Microsoft Technologies, Power Query, Power View

Most Popular Names Visualized in Excel

A couple of months ago, I came across an article in the Atlantic that showed an animated gif of the most popular baby names by state by year. I decided to visualize this data using various add-ins and chart types in Excel.

Retrieving the Data

I found the data on the Social Security Administration website. This data showed the top 5 girl names in a table for each state for a given year. I needed to pull in multiple years, so I used Power Query and wrote a bit of M. I followed the guidance from a great blog post by Devin Knight:

  1. Choose From Web and enter the url for a page containing any year of data.
  2. Modify the M to create a function that accepts a year parameter and uses that to formulate the correct url for each year.
  3. Create a second query from scratch and provide an array of years.
  4. Convert this data to a table and insert a calculated column that pulls in the data from the first query.
  5. Expand the columns, rename columns, and change to appropriate data types where necessary.

For more detail, see Devin’s post. He does a great job of explaining and providing screenshots.

Once I had my data it was time to show it off.

Power Map

My first instinct was to use Power Map since its specialty is showing changes in geospatial data over time. After trying each type of chart in Power Map, I found that the region shading was the closest to what I wanted.

Power Map Baby NameAlthough I wasn’t just trying to duplicate the animated gif, I found it difficult to display this data in an equally effective way in Power Map due to the lack of ability to add data labels or customize annotations. There was no good way to show the most popular name on the state. I chose to use annotations to display the name. You can use data fields to populate the title and description of an annotation. The biggest drawback of using annotations is that there is no way to resize or relocate them. If I were to add annotations for each state, they would overlap and you wouldn’t be able to read many of them nor would you be able to see most of the map. I was able to add enough annotations that there was usually one annotation for each of the names shown on the map at any time. One other issue I noticed is that occasionally an annotation will fail to update on time, so the data in the annotation will be old and correspond to a previous year.  If you watch the video for my Power Map, you will see that the annotation for Virginia gets stuck on 1983 for a few years, but it eventually catches up. Update: Microsoft responded to my question on this.  We figured out that the annotations will update as expected if you set the Time to Date(Year) and then change the time settings to “Data Shows for an Instant”.  I have updated my workbook with these settings. You can watch the video of my Power Map tour here. Outside of the inability to add good labels, watching the states change color over time works well. You can have a legend that shows the names with the colors, but it will show every name from the data set and will not be an effective use of space.

Power View

Power View provides the ability to create bubble maps. I made a bubble map tiled by year.

Power View Baby NameIf you click on a year tile, you can then use the arrow keys to flip through the years; you can also use the scroll bar below the tiles. I think this provides a better visual than the Power Map for this situation since it is easier to see which names go with which states. Because of the bubble colors, I can see how the trend went from Jennifer to Jessica to Ashley in the 1980s. We also notice that there are 5+ top names from the 1990s forward, whereas the 1980s had 2 – 4.  The only real problem I encountered with the map in Power View was the limitation of the colors in the themes. When there are several names on the map, some of the colors can be very similar.  I tried several themes and couldn’t find one where this wasn’t the case.

Pivot Table

Next I decided to stray from maps and see if there were other chart choices that suited this data.  I created a pivot table with data bars and a timeline slicer.

Excel pivot baby nameI can still tell which baby names were most popular across the US and see how the trends change over time. The piece of information that I lose in moving away from maps is seeing how trends affect states that are geographically near each other. For instance, I can tell that Jessica was popular in 1985 in states such as California, Florida, New York, and Michigan. But I don’t immediately recognize that Jessica was popular only in New England, Florida, and the Mountain West and West Coast because Ashley had taken over the    Mid-Atlantic and Midwest. One advantage of the pivot table is that I can select multiple years. So I can learn things like Ashley was the most popular name overall in the 1990s.

Adventures with Apps for Office

Microsoft Research published several free apps for data visualization in Excel.  I decided to use a couple of them to visualize the baby name data. You can access these apps on the Insert tab.  Click Apps for Office -> See All.  Then click Find More Apps at the Office Store. One thing to note about these apps is that they require the data to be formatted in a certain way.  See the instructions in the links below for details on this.

apps for office

Streamgraph

A streamgraph shows how a set of numbers has changed over time using their relative area. It’s kind of like an area chart that doesn’t use an absolute Y axis.

steam graph baby names

I do not think this is an effective way to display the data.  Area graphs and streamgraphs both fail when trying to display a large number of values.  In this specific case, you can only pick a color and the shades are determined for you. It’s difficult to distinguish some of the shades from others. The fact that the values for the names can overlap and aren’t a specific shape (like a rectangle where you only have to judge length) make it difficult to interpret quickly.  You lose most of the advantages of pre-attentive processing because you really have to study the graph to get information from it.

Treemap

A treemap is a hierarchical view of data that breaks it into rectangles and uses size and color to communicate information.

tree map

For some reason there is a size limitation on this treemap so I couldn’t make it big enough to where all the names would fit in the boxes. Since the treemap plots the boxes so they fit into the overall rectangle, years are not placed in sequential order.  In my treemap, both the color and size represent the number of count of names for the year. Using a treemap removes the ability for the viewer to see trends over time.  I have to look in the bottom row for 1980 – 1982 and then jump to the top left to find 1985. I also had to summarize to top names by year, losing any geographical/state information.  One thing this does do well is compare 2 – 3 values for a name across years.  For instance, I can compare the shade of green for Jennifer in 1981, 1983, and 1985 to see the decline in popularity of Jennifer over time.

And the winner is…

I think the Power View does the best job of communicating all of the available information from the data set while being visually appealing. We can see geographic relationships and temporal relationships.  It’s easy to read and quickly see which names were top in which states at any year.

You can download my Excel file here if you would like to check it out or try to improve upon it.