Microsoft Technologies, Power BI, Power Pivot, Power Query, SSAS

Documenting your Tabular or Power BI Model

If you were used to documenting your SSAS model using the MDSchema rowsets, you might have noticed that some of them do not work with the new tabular models. For example, the MDSCHEMA_MEASUREGROUP_DIMENSIONS DMV seems to just return one row per table rather than a list of the relationships between tables.

Not to worry, though. With the new compatibility level 1200 SSAS Tabular models came some new DMVs. They are, unfortunately, not documented at this time (as of 3 Oct 2016). I expect them to show up here at some point in the future. Until then I published a Gist that provides a list of those views as well as some notes on what kind of useful info is in each one. While there are 36 TMSCHEMA DMVs, the main DMVs you will want to check out for model documentation purposes are:

  • TMSCHEMA_COLUMNS
  • TMSCHEMA_MEASURES
  • TMSCHEMA_MODEL
  • TMSCHEMA_RELATIONSHIPS
  • TMSCHEMA_TABLES

There are a few fields that show IDs rather than descriptions (e.g., ExplicitDataType in TMSCHEMA_COLUMNS, Type in TMSCHEMA_DATA_SOURCES). I’ve been able to figure out the descriptions that correspond to some of the values, but I’m still missing a few.

As with the MDSCHEMA DMVs, the TMSCHEMA DMVs can be used to document your model. This also works with Power BI models. You’ll just need to change the connection information to your Power BI instnce while the .pbix file is open in Power BI Desktop. For more information analyzing your Power BI model, see Chris Webb’s post. Since DMVs require DMX queries, you are somewhat limited in your ability to join the data from the DMVs together. This makes it necessary to store the data somewhere before transforming and merging them. Since I needed to document a tabular model for a client, I created a Power Pivot model in Excel 2016. You can download my documentation model here.  My example model was connected to a quick demo SSAS model I made based upon the Wide World Importers DW database.

My Power Pivot Model

My Power Pivot model uses Power Query to import the data from the DMVs. This enables me to merge data into a model that I feel makes the most sense and doesn’t suffer from too many relationships and hidden tables (which is what would happen if you just imported the results of each DMV query into the model). In order to make this work for any model, I employed a trick I learned from Chris Webb to avoid duplicating connection information across queries. I built my model with the normal embedded connection strings and then converted them to use the connection Connection Info tables. As I did that, I ran into a Power Query limitation as evidenced by the error message below.

“Formula.Firewall: Query ‘QueryName’ (step ‘StepName’) references other queries or steps and so may not directly access a data source. Please rebuild this data combination.”

Ken Puls has a great explanation of how to get around this issue. Basically, you have to redesign your queries so that you have “staging tables”, which you can then use to build finalized tables. This redesign gets you around the error since the finalized tables don’t have the embedded source connection, just a reference to the results from another query.

This Excel file documents:

  • Database and model info
  • Tables and source queries
  • Columns
  • Hierarchies
  • Measures
  • KPIs
  • Security roles, membership, and permissions
  • Relationships
  • Perspectives and the columns, hierarchies, and measures they contain

The Database Info tab uses cube functions to display the values shown. The rest of the tabs use pivot tables and slicers.

You can download and use this model if you have Excel 2016 or 2013 with Power Query. Just download the file, change the values in the TabularInstanceName and TabularDBName queries, and refresh. I was able to open the file and update the queries with no issues and no changes in Excel 2013.

This is great if you just need documentation or a quick way to look up specific information. But we could really analyze this using visualizations other than tables in Power BI Desktop.

My Power BI Model

I imported the Excel model into Power BI, made a few adjustments, and then added some visualizations. This report is a bit paired down from a visualization standpoint compared to the Excel file because I eliminated some of the data that wasn’t analytically relevant for me.

Click here to view this report in Power BI.

On the first page, I have several summary numbers that tell me about the contents of the model. It gives me a decent idea of the size and complexity of the model.

tabular-doc-summary

The second page uses a bar chart as an interactive filter against a list of table source queries and columns.

The third page shows relationships between the tables. Slicers on the left filter both the force directed graph and the table. Smart filters at the top of the page let you filter the Table and To Table in the relationships.

tabular-doc-relationshipsThe fourth page shows the details behind each measure. A column chart at the bottom provides a count of measures by table and whether the measure is hidden, which acts as another interactive filter.

The last page shows security by role and table, again using column charts as interactive filters.

You can download the Power BI model here.

Update: Check out the Demo Day video I made on the BlueGranite blog for further explanation and a demo of the Tabular Model documentation. 

Microsoft Technologies, Power BI, Power Query

Using Power Query to Transform Website Data with Multiple Rows Per Entity

My colleague Hope Foley and I both enjoy a good craft beer. She has a great presentation on spatial data in SQL Server, which contains data about breweries. She mentioned she was gathering new data to add to her brewery database and showed me the brewery listing on the Brewery Collectibles Club of America website. This web page presents a challenge because of the way the data is laid out. Rather than having a table with one row per brewery, the web page uses one table per brewery with two rows in each table.
pqbeerwebsite
I typically use Microsoft Power Query to scrape data from websites for further analysis. Although this requires a bit of creativity (and some M), I knew Power Query was up to the task. And it gave me a chance to use my favorite Power Query function: Unpivot.

I opened up Excel and established my data source in Power Query. I chose From Web and entered the url: http://www.bcca.com/services/brewery_listing.asp. Once the data source loaded, I could see that there were over 3,000 tables within the page. pqbeertablemenu

Choosing table 0 showed me the columns headings for the overall table on the page.

PQbeertableheader

I could see the headings for the 6 fields, arranged in 2 rows as we see on the webpage.  I decided to figure out how to transform this table into the one row I needed and then figure out how to automate my process so I could reproduce the results for each subsequent table. I used the Power Query GUI to transform my table into 1 row with the following steps.

  1. On the Add Column tab, choose Add Index Column -> From 0.
  2. Select the Index column. On the Transform Tab, choose Unpivot Columns -> Unpivot Other Columns. This creates a table with 3 columns: Index, Attribute, and Value.
  3. Select the Index and Attribute columns. On the Home tab, choose Remove Columns -> Remove Columns.
  4. On the Transform tab, choose Transpose.

 

This creates the following M code, which you can see by clicking on Advanced Editor on the View tab.

let
 Source = Web.Page(Web.Contents("http://www.bcca.com/services/brewery_listing.asp")),
 Data0 = Source{0}[Data],
 #"Changed Type" = Table.TransformColumnTypes(Data0,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}}),
 #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
 #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),
 #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Index", "Attribute"}),
 #"Transposed Table" = Table.Transpose(#"Removed Columns")
in
 #"Transposed Table"

Next I used that query to create a function by changing just two lines.

(gettable) =>
let
 Source = Web.Page(Web.Contents("http://www.bcca.com/services/brewery_listing.asp")),
 Data0 = Source{(gettable)}[Data],
 #"Changed Type" = Table.TransformColumnTypes(Data0,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}}),
 #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
 #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),
 #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Index", "Attribute"}),
 #"Transposed Table" = Table.Transpose(#"Removed Columns")
in
 #"Transposed Table"

I can see the number of tables available in my data source, so I can create a query that invokes the function for the required number of tables, in my case 3744. As a final touch, I used the first row as headers.

let
Source = Table.Combine(List.Transform({0..3744}, Query1)),
#"First Row as Header" = Table.PromoteHeaders(Source)
in
#"First Row as Header"

Et voilà!

PQBeerFinalList

I now have a single table of data with one row per brewery.

Excel, Microsoft Technologies, Power BI, Power Pivot, Power Query

Moving Calculated Measures in Power Pivot for Excel 2013

I learned a lesson the hard way: I shouldn’t change field names and data types in Power Pivot on tables that were imported using Power Query. My changes broke the connection between the two tools, so when I refreshed a query in Power Query that was set to load the results to my data model it caused a new/duplicate table to be created in my data model instead of updating the original table.

PPBlogTables

I already had Power View reports built off of my model that I didn’t want to rebuild, and I didn’t want to leave a duplicate table in my model, but I needed to make the data model refreshable. The issue can be fixed, but it’s a bit tedious:

  1. Create relationships for the new table to match those of the old table
  2. Rename the old table to something else
  3. Rename the new table to the original table name (usually it has a 1 after the table name)
  4. Rebuild or move any calculated measures to the new table
  5. Delete the old table

I was fine until I got to #4. You can’t just copy a calculated measure to a new table by cutting and pasting. You can copy the formula from the formula bar and paste it into a cell on another table, but you will get a warning that you can’t have two measures with the same name.

PPBLogDupName

Going this route means pasting your measure, renaming the old measure, and then renaming the new copy of the measure to the original name.  Due to the lack of the ability to filter data in Power Map, I had to create several calculated measures in my data model to limit the data shown in my map.  The idea of copying, pasting, and renaming several dozen measures was less than appealing.

It should be noted that in the current version of Power Pivot, you can put calculated measures on any table in the model without affecting the result of a query using the measure. The table in which it is located is more of an organization issue rather than a technical calculation issue.

I thought there must be a better way to move a measure. I looked through all of the buttons on the ribbon in the Power Pivot window and saw nothing helpful so I turned to Google (yes, I still choose Google over Bing). A TechNet article provided just the guidance I needed:

“To move a measure, you can cut and paste the formula in the Calculation Area. You can only move it to a different location in the Calculation Area of the table in which it was created. You cannot move a measure to different table in this view; instead use the Measure Settings dialog in the PowerPivot ribbon in Excel to change the association.”

The Measure Settings dialog box held the answer, although I found these settings under the Calculated Fields -> Managed Calculated Fields.  PPBlogManageFields

It contains a drop-down box that allows me to choose the table where the definition of the measure should be stored. This was a much quicker way to quickly move several calculated measures to another table in Power Pivot.

PPBlogcalcFieldSet

I have built several Power Pivot models and never run into this particular situation. I can’t say I’m glad it happened, but I can appreciate that it reminded me that there are useful things on the Power Pivot ribbon in the Excel window that aren’t accessible through the Power Pivot window.

Data Visualization, Excel, Microsoft Technologies, Power Query, Power View

Most Popular Names Visualized in Excel

A couple of months ago, I came across an article in the Atlantic that showed an animated gif of the most popular baby names by state by year. I decided to visualize this data using various add-ins and chart types in Excel.

Retrieving the Data

I found the data on the Social Security Administration website. This data showed the top 5 girl names in a table for each state for a given year. I needed to pull in multiple years, so I used Power Query and wrote a bit of M. I followed the guidance from a great blog post by Devin Knight:

  1. Choose From Web and enter the url for a page containing any year of data.
  2. Modify the M to create a function that accepts a year parameter and uses that to formulate the correct url for each year.
  3. Create a second query from scratch and provide an array of years.
  4. Convert this data to a table and insert a calculated column that pulls in the data from the first query.
  5. Expand the columns, rename columns, and change to appropriate data types where necessary.

For more detail, see Devin’s post. He does a great job of explaining and providing screenshots.

Once I had my data it was time to show it off.

Power Map

My first instinct was to use Power Map since its specialty is showing changes in geospatial data over time. After trying each type of chart in Power Map, I found that the region shading was the closest to what I wanted.

Power Map Baby NameAlthough I wasn’t just trying to duplicate the animated gif, I found it difficult to display this data in an equally effective way in Power Map due to the lack of ability to add data labels or customize annotations. There was no good way to show the most popular name on the state. I chose to use annotations to display the name. You can use data fields to populate the title and description of an annotation. The biggest drawback of using annotations is that there is no way to resize or relocate them. If I were to add annotations for each state, they would overlap and you wouldn’t be able to read many of them nor would you be able to see most of the map. I was able to add enough annotations that there was usually one annotation for each of the names shown on the map at any time. One other issue I noticed is that occasionally an annotation will fail to update on time, so the data in the annotation will be old and correspond to a previous year.  If you watch the video for my Power Map, you will see that the annotation for Virginia gets stuck on 1983 for a few years, but it eventually catches up. Update: Microsoft responded to my question on this.  We figured out that the annotations will update as expected if you set the Time to Date(Year) and then change the time settings to “Data Shows for an Instant”.  I have updated my workbook with these settings. You can watch the video of my Power Map tour here. Outside of the inability to add good labels, watching the states change color over time works well. You can have a legend that shows the names with the colors, but it will show every name from the data set and will not be an effective use of space.

Power View

Power View provides the ability to create bubble maps. I made a bubble map tiled by year.

Power View Baby NameIf you click on a year tile, you can then use the arrow keys to flip through the years; you can also use the scroll bar below the tiles. I think this provides a better visual than the Power Map for this situation since it is easier to see which names go with which states. Because of the bubble colors, I can see how the trend went from Jennifer to Jessica to Ashley in the 1980s. We also notice that there are 5+ top names from the 1990s forward, whereas the 1980s had 2 – 4.  The only real problem I encountered with the map in Power View was the limitation of the colors in the themes. When there are several names on the map, some of the colors can be very similar.  I tried several themes and couldn’t find one where this wasn’t the case.

Pivot Table

Next I decided to stray from maps and see if there were other chart choices that suited this data.  I created a pivot table with data bars and a timeline slicer.

Excel pivot baby nameI can still tell which baby names were most popular across the US and see how the trends change over time. The piece of information that I lose in moving away from maps is seeing how trends affect states that are geographically near each other. For instance, I can tell that Jessica was popular in 1985 in states such as California, Florida, New York, and Michigan. But I don’t immediately recognize that Jessica was popular only in New England, Florida, and the Mountain West and West Coast because Ashley had taken over the    Mid-Atlantic and Midwest. One advantage of the pivot table is that I can select multiple years. So I can learn things like Ashley was the most popular name overall in the 1990s.

Adventures with Apps for Office

Microsoft Research published several free apps for data visualization in Excel.  I decided to use a couple of them to visualize the baby name data. You can access these apps on the Insert tab.  Click Apps for Office -> See All.  Then click Find More Apps at the Office Store. One thing to note about these apps is that they require the data to be formatted in a certain way.  See the instructions in the links below for details on this.

apps for office

Streamgraph

A streamgraph shows how a set of numbers has changed over time using their relative area. It’s kind of like an area chart that doesn’t use an absolute Y axis.

steam graph baby names

I do not think this is an effective way to display the data.  Area graphs and streamgraphs both fail when trying to display a large number of values.  In this specific case, you can only pick a color and the shades are determined for you. It’s difficult to distinguish some of the shades from others. The fact that the values for the names can overlap and aren’t a specific shape (like a rectangle where you only have to judge length) make it difficult to interpret quickly.  You lose most of the advantages of pre-attentive processing because you really have to study the graph to get information from it.

Treemap

A treemap is a hierarchical view of data that breaks it into rectangles and uses size and color to communicate information.

tree map

For some reason there is a size limitation on this treemap so I couldn’t make it big enough to where all the names would fit in the boxes. Since the treemap plots the boxes so they fit into the overall rectangle, years are not placed in sequential order.  In my treemap, both the color and size represent the number of count of names for the year. Using a treemap removes the ability for the viewer to see trends over time.  I have to look in the bottom row for 1980 – 1982 and then jump to the top left to find 1985. I also had to summarize to top names by year, losing any geographical/state information.  One thing this does do well is compare 2 – 3 values for a name across years.  For instance, I can compare the shade of green for Jennifer in 1981, 1983, and 1985 to see the decline in popularity of Jennifer over time.

And the winner is…

I think the Power View does the best job of communicating all of the available information from the data set while being visually appealing. We can see geographic relationships and temporal relationships.  It’s easy to read and quickly see which names were top in which states at any year.

You can download my Excel file here if you would like to check it out or try to improve upon it.