Microsoft Technologies, Power BI, Power Pivot, Power Query, SSAS

Documenting your Tabular or Power BI Model

If you were used to documenting your SSAS model using the MDSchema rowsets, you might have noticed that some of them do not work with the new tabular models. For example, the MDSCHEMA_MEASUREGROUP_DIMENSIONS DMV seems to just return one row per table rather than a list of the relationships between tables.

Not to worry, though. With the new compatibility level 1200 SSAS Tabular models came some new DMVs. They are, unfortunately, not documented at this time (as of 3 Oct 2016). I expect them to show up here at some point in the future. Until then I published a Gist that provides a list of those views as well as some notes on what kind of useful info is in each one. While there are 36 TMSCHEMA DMVs, the main DMVs you will want to check out for model documentation purposes are:

  • TMSCHEMA_COLUMNS
  • TMSCHEMA_MEASURES
  • TMSCHEMA_MODEL
  • TMSCHEMA_RELATIONSHIPS
  • TMSCHEMA_TABLES

There are a few fields that show IDs rather than descriptions (e.g., ExplicitDataType in TMSCHEMA_COLUMNS, Type in TMSCHEMA_DATA_SOURCES). I’ve been able to figure out the descriptions that correspond to some of the values, but I’m still missing a few.

As with the MDSCHEMA DMVs, the TMSCHEMA DMVs can be used to document your model. This also works with Power BI models. You’ll just need to change the connection information to your Power BI instnce while the .pbix file is open in Power BI Desktop. For more information analyzing your Power BI model, see Chris Webb’s post. Since DMVs require DMX queries, you are somewhat limited in your ability to join the data from the DMVs together. This makes it necessary to store the data somewhere before transforming and merging them. Since I needed to document a tabular model for a client, I created a Power Pivot model in Excel 2016. You can download my documentation model here.  My example model was connected to a quick demo SSAS model I made based upon the Wide World Importers DW database.

My Power Pivot Model

My Power Pivot model uses Power Query to import the data from the DMVs. This enables me to merge data into a model that I feel makes the most sense and doesn’t suffer from too many relationships and hidden tables (which is what would happen if you just imported the results of each DMV query into the model). In order to make this work for any model, I employed a trick I learned from Chris Webb to avoid duplicating connection information across queries. I built my model with the normal embedded connection strings and then converted them to use the connection Connection Info tables. As I did that, I ran into a Power Query limitation as evidenced by the error message below.

“Formula.Firewall: Query ‘QueryName’ (step ‘StepName’) references other queries or steps and so may not directly access a data source. Please rebuild this data combination.”

Ken Puls has a great explanation of how to get around this issue. Basically, you have to redesign your queries so that you have “staging tables”, which you can then use to build finalized tables. This redesign gets you around the error since the finalized tables don’t have the embedded source connection, just a reference to the results from another query.

This Excel file documents:

  • Database and model info
  • Tables and source queries
  • Columns
  • Hierarchies
  • Measures
  • KPIs
  • Security roles, membership, and permissions
  • Relationships
  • Perspectives and the columns, hierarchies, and measures they contain

The Database Info tab uses cube functions to display the values shown. The rest of the tabs use pivot tables and slicers.

You can download and use this model if you have Excel 2016 or 2013 with Power Query. Just download the file, change the values in the TabularInstanceName and TabularDBName queries, and refresh. I was able to open the file and update the queries with no issues and no changes in Excel 2013.

This is great if you just need documentation or a quick way to look up specific information. But we could really analyze this using visualizations other than tables in Power BI Desktop.

My Power BI Model

I imported the Excel model into Power BI, made a few adjustments, and then added some visualizations. This report is a bit paired down from a visualization standpoint compared to the Excel file because I eliminated some of the data that wasn’t analytically relevant for me.

Click here to view this report in Power BI.

On the first page, I have several summary numbers that tell me about the contents of the model. It gives me a decent idea of the size and complexity of the model.

tabular-doc-summary

The second page uses a bar chart as an interactive filter against a list of table source queries and columns.

The third page shows relationships between the tables. Slicers on the left filter both the force directed graph and the table. Smart filters at the top of the page let you filter the Table and To Table in the relationships.

tabular-doc-relationshipsThe fourth page shows the details behind each measure. A column chart at the bottom provides a count of measures by table and whether the measure is hidden, which acts as another interactive filter.

The last page shows security by role and table, again using column charts as interactive filters.

Check out the Demo Day video I made on the BlueGranite blog for further explanation and a demo of the Tabular Model documentation.

Update 19 Sep 2017: I have turned the Power BI report into a Power BI Template for easier use. You can download it here

DAX, Microsoft Technologies, Power BI, Power Pivot

Using Context To Traverse Hierarchies In DAX

My friend and coworker Melissa Coates (aka @sqlchick) messaged me the other day to see if I could help with a DAX formula. She had a Power BI dashboard in which she needed a very particular interaction to occur. She had slicers for geographic attributes such as Region and Territory, in addition to a chart that showed the percent of the regional total that each product type represented.  The product type was in the fact/data table. Region and territory were in a dimension/lookup table and formed a hierarchy where a region was made up of one or more territories and each territory had only one region.Territory Hierarchy

The requirement was that if a user selected no regions and one territory, the chart should show the % contribution of each product type for the region in which the territory is located.

PBI Undesired Behavior

Back in the MDX days this would be easy because we could go up the hierarchy (assuming the hierarchy is called Geography) using something like [Geography].[Geography].CurrentMember.Parentor Ancestors([Geography].[Geography].CurrentMember,1).

I miss MDX

DAX doesn’t have this same capability, so you have to approach it a bit differently.

I made a quick demo model to mimic the circumstances.

  • The Geography table contains an ID as well as Region and Territory.
  • The Product table contains and ID and Product Name.
  • The Product Type is actually in the fact/data table along with the Sales Amount.
  • The relationships between the tables are many:1 from the Sales table to the lookup tables, both with cross filter direction set to single.

The ultimate calculation we wanted was <regional product type sales amount> / <region sales amount total for all product types>.

The initial calculation we started with was:

Percent Of Region Total :=
CALCULATE (
 SUM ( 'Sales'[Sales Amount] ),
 ALLSELECTED ( 'Geography'[Region] ),
 ALL ( 'Geography'[Territory] )
)
 / CALCULATE (
 SUM ( 'Sales'[Sales Amount] ),
 ALL ( 'Sales'[Product Type] ),
 ALLSELECTED ( 'Geography'[Region] ),
 ALL ( 'Geography'[Territory] )
 )

Notice that the numerator and denominator are the same, except the denominator overrides the query context for Product Type. This calculation worked fine when a single region was selected and no territories were selected. Once a territory is selected, it limits the scope to just that territory rather than the entire region, which is not desired.

In order to make sure I understood the query and filter context, I added two calculated measures:

TestFilters := IF ( HASONEFILTER ( 'Geography'[Region] )"1""Many" )
TestValues:= IF (HASONEVALUE('Geography'[Region] )"1""Many" )

I put these measures in a pivot table along with my product types and my geography slicers, and I selected no regions and one territory. The TestFilters measure returned “Many”, but the TestValues measure returned “1”. So I had my first clue as to how to find my desired region.

Filters/Values Test

Next I worked on my numerator in my final calculation. I knew my measure would include the sum of Sales Amount, so I started with that and added CALCULATE() to it in order to change the context. I needed to remove the territory context so I started with Calculate(Sum('Sales'[Sales Amount]), ALL('Geography'[Territory]))

Then I needed to group those results by region and filter to only the desired region, which I did using MAXX(VALUES(Geography[Region])). If I selected a single region or a single territory, I would only get one region back in the values. MAXX() was used to retrieve a single result rather than a table. My numerator ended up as:

Region Contr Numerator :=
MAXX (
VALUES ( Geography[Region] ),
CALCULATE ( SUM ( Sales[Sales Amount] ), ALL ( Geography[Territory] ) )
)

This way, the context of the region corresponding to the selected territory is saved outside of the CALCULATE() in the VALUES(‘Geography'[Region]), but it was removed inside the CALCULATE() to provide the region total. This formula works when no territories and one region is selected as well as when no regions and one territory is selected.

The denominator is written similarly with the addition of the ALL(‘Product'[ProductType]) filter inside of the CALCULATE().

Region Contr Denominator =
MAXX (
VALUES ( Geography[Region] ),
CALCULATE (
SUM ( Sales[Sales Amount] ),
ALL ( 'Sales'[Product Type] ),
ALL ( Geography[Territory] )
)
)

To finish up, I created my final calculation and added a bit of error handling for cases when no regions or territories are selected:

Region Avg =
IF (
HASONEVALUE ( 'Geography'[Region] ),
DIVIDE ( [Region Contr Numerator], [Region Contr Denominator] ),
BLANK ()
)

These calculations provided the desired result.

Region Contribution Final

Note that this only works with the cross filter direction set to single in the relationships. Also, originally I had a Product Type attribute in my Product table, used DAX to add it to my Sales table (to replicate the situation we were solving for) , and then forgot and put the Product Type from Product in my visuals. This clearly didn’t work because I didn’t set my context in the calculations using the field from the Product table.

Hopefully this will come in handy if you have some interesting filter/slicer requirements in your reports.

DAX, Excel, Microsoft Technologies, Power BI, Power Pivot

Improving Performance in Excel and Power View Reports with a Power Pivot Data Source

On a recent project at work, I ran into some performance issues with reports in Excel built against a Power Pivot model. I had 2 Power Views and 2 Excel pivot table reports, of which both Excel reports were slow and one Power View was slow. I did some research on how to improve performance to make the reports return results more quickly, and I found some great resources that I’ve compiled here in the hopes that they help you as well.

Purposeful Sorting of Source Queries

Alberto Ferrari suggests that you check the sorting and design of the tables you import into Power Pivot.  It seems that sort order does not significantly affect the final size of the data in Power Pivot, but it can affect performance when querying the model. I believe the goal is to order the columns based upon the number of distinct values ascending. I didn’t see a measurable improvement in querying my Power Pivot model after implementing this, but the benefits realized probably depend a lot on the type and size of your data.

Power Pivot Data Source Tips

André Van de Graaf has a great list of Power Pivot tips on his blog.  My favorites are:

  • Use views to import data from a database.
  • Import only the columns that are useful and analytically relevant.
  • Reduce the number of rows to import by including only those you are going to analyze (e.g.: perhaps you only need the last two years of data instead of five).

I used database views to specify only the columns and rows I needed in addition to performing any calculations that could be done on the row-level data.  Putting the calculation in the SQL view is often more efficient (and easier for DAX beginners) than creating a calculated column in a Power Pivot table.

Query Performance and Slicers

Power Pivot Pro has a good blog post about report performance when using slicers.  It includes some great tips:

  • Avoid using slicers for fields with many distinct values or fields that come from very large (usually fact/measures) tables.
  • Reduce the number of slicers.  If you can’t reduce the number of slicers, consider turning off cross-filtering on the slicers.
  • Reduce the number and complexity of measures.

It turns out that slicers issue 2 queries each: one to get the list of values and one to determine if those values have related data in the connected pivot table(s).  Slicers are a wonderful usability improvement for report consumers, and I think you should use them in place of report filters whenever it makes sense.  But they can slow performance to an unacceptable level. This is why you may need to remove a couple of slicers from a report that has several. Or you might just turn off cross-filtering.  Since the query for the pivot table basically has to be executed to determine which of the slicer values have related data, simplifying the measures and calcs used in your pivot table helps as well.

My Power Pivot model contained sales data by product. Those products were segmented into 3 product categories.  I had a product category slicer in my report for which I turned off cross-filtering.  With only 3 values, the cross-filtering wasn’t as valuable as the performance improvement achieved by turning it off.

Performance Tuning of Tabular Models

I also re-read the white paper on performance tuning SSAS tabular. Some of the suggestions weren’t applicable since I didn’t control the server where the Power Pivot model sits (my workbook was stored in Office 365). But the tips on changing the DAX calculations and the model are helpful:

  • Convert calculations into calculated columns.
  • Avoid error handling functions.
  • Detect filters using optimized functions like HASONEVALUE() and HASONEFILTER() rather than COUNTROWS().
  • Use CALCULATE() instead of FILTER().
  • Decide whether to use ISBLANK() versus ISEMPTY().
  • Use natural hierarchies, if possible.

Calculated columns are computed when the model is processed (refreshed) whereas calculated measures are computed during query execution. So if your queries are slow, you may want to attempt to use a calculated column to simplify a complex calculated measure, which should improve query execution times.

I used HASONEVALUE() in several calculations that dealt with time intelligence. We did a lot of year over year comparisons, so I had formulas to compare a 4 week period to the same 4 week period in the prior year based upon the fiscal calendar.

Qty Sold Last 4 Wks Prior Yr:=if(HASONEVALUE('End Date'[FiscalWeekKey]),
 CALCULATE([Total Quantity Sold], FILTER (ALL ('End Date' ),
 'End Date'[DaySeq] <= MAX ( 'End Date'[dayseqlastyear]) 
 && 'End Date'[dayseq] > Max('End Date'[dayseqlastyear]) - 28)),BLANK())

HasOneValue checks to make sure the row or column in the pivot table where this calculation is used includes only 1 week so that I can be sure I have the correct 4 week period.

This project reinforced the fact that DAX CALCULATE() function is very powerful and usually much faster than FILTER(). I had a few calculations that started out taking minutes to return for a model that had one fact table with about 8 million rows and 3 small dimension tables.  I rewrote the calculations to remove the FILTER() function and just use CALCULATE() with the same criteria in the filter clause, and they returned the correct values in under a second.

Conclusion

In the end, I used a combination of these tips for my project.  The changes that had the biggest impact were:

  • I reduced the number of rows I was importing.  I originally was pulling all data from 1/1/2010 forward, but I changed it to show the current calendar year plus the previous 3 years.
  • I turned off cross-filtering on some of my slicers.
  • I rewrote some calculated measures to use CALCULATE and remove FILTER()

As PowerPivotPro said almost 3 years ago, Slicer cross filtering is very often the #1 source of slowdown. It is still true today, and it will be one of the first places I look when I am faced with performance issues in Excel reports in the future.

Data Visualization, DAX, Excel, Microsoft Technologies, Power BI, Power Pivot, Power View

Color Coding Values in Power View Maps Based Upon Positive/Negative Sign

Power View can be a good tool for interactive data visualization and data discovery, but it has a few limitations with its mapping capabilities.

  • Power View only visualizes data on maps using bubbles/pies.
  • The size of the bubble on a Power View map can be misleading during analysis for data sets with small values and negative values.
  • By default, Power View can only display one measure on a map, which it uses to determine the size of the bubble.

Microsoft offers other mapping capabilities in Power Map and SSRS maps, so if all you need is a map, you can explore those other tools to determine if they better meet your needs. Check out my presentation on Choosing Your Geospatial Mapping Tool for more info on requirements, capabilities, and limitations of maps in Power View, Power Maps, and SSRS. If you want to see a map in the context of other charts, and you want the interactive highlighting and filtering of Power View, you may look for workarounds to the second and third issues listed above.

I recently had a client express disappointment about the display of negative values in Power View maps. A negative value simply gets a small bubble, and positive value gets a larger bubble. You have to hover over the bubbles to see each value and understand the bubble size that represents the change in sign from positive to negative. This gets a little tedious and minimizes the usefulness of the bubble map in the first place. The example Power View map below shows the year over year change in trespassing incidents in my city from 2013 to 2014. The small bubble has a value of -16, and the large bubble to the right has a value of 6.

PV Map Neg Values

One way to better present this type of data in a Power View map is to have positive values represented in one color and negative values represented in another with the bubble sizes showing the absolute value of the change. This proved to be less than straight forward as I worked my way through it. You can’t just make separate measures for the positive and negative values because Power View won’t let you put two measures on a map in that manner. You can’t simply create a calculated measure that determines the sign of your values and put it in the legend because Power View is expecting categorical data for the legend and it won’t let you put the numerical data field in the legend. Power View also doesn’t recognize calculated measures that return a text value (they don’t show up in the field list), so you can’t create a measure that turns the sign into a text value (“Positive” or “Negative”) in an effort to make it more categorical.

But you can use dynamic segmentation to show the second measure as a dimension. There is a great video on MSBI Academy that walks you through this. Although the video shows two different measures on a map, you can use this method to color code positive and negative values of a single measure. First, create a disconnected table for the number sign attributes. It should contain at least:

  • a text field with a descriptor
  • a field for the minimum value of the range
  • a field for the maximum value of the range.

The video shows storing that table in Excel (which is great if you plan to make changes), but you can now also paste the table directly into Power Pivot without any link to the table in Excel. My final table looks like the image below.

PV Map Sign Table

Next you need a calculation that uses the disconnected table and displays the absolute value of the measure you want to use for the bubble size.

Change In Incidents YOY Pos/Neg:=
     if(CALCULATE(Countrows('Sign'),
     FILTER('Sign',[Change in Incidents YOY] >= Sign[Min] 
     && [Change in Incidents YOY] < Sign[Max])
     ),
     abs([Change in Incidents YOY])
     )

Then you can build your map.  Use the Pos/Neg field for color, the new calculation for size, and your location fields of choice.

PV Map Pos Neg 1 Layer

Note that you will now see a warning message in the Power View fields list that relationships may be needed. You can disregard that. The use of a disconnected table is the cause of that message, and everything is working as intended.

PV Rel Msg

If your map has no drill down levels, you can stop here. Your map will now have the values for positive and negative in the legend with bubbles of two colors on the map. A bubble whose signed value is 5 will be the same size as a bubble whose signed value is -5 with the color indicating the sign of the number. In this analysis, a positive number indicates an increase in trespassing, so I chose a theme and ordered my values in the sign table such that the positive values were assigned the brighter color. These are the bubbles that need attention since trespassing is rising in the related zip codes.

If you do have drill down levels in your map, you have a couple more steps. You need another measure that deals with the drill down path. Without this extra step you end up with pies and unexpected values.

PV Map Pos Neg 2 Layer Bad

My map allows the user to drill down from zip code to beat. My new calculation references the top level of my drill down path ([Zip Code]), the calculation I just created ([Change In Incidents YOY Pos/Neg]), and the second level of the drill down path in my map ([Beat]). Note that this calculation is specific to the drill down path used in the map. If you have several maps with different drill down paths, you will need to create several calculations for this to work.

Change In Incidents YOY Pt 2:=
     IF(COUNTROWS(ALLSELECTED('KCPD Crime Data 2014'[Zip Code]))=1, 
     [Change In Incidents YOY Pos/Neg], 
     CALCULATE([Change In Incidents YOY Pos/Neg], 
     ALLSELECTED('KCPD Crime Data 2014'[Beat]))
     )

Now I can put this new measure in place of the original one, and the pies will go away, returning the expected color coded bubbles and values.

PV Map Pos Neg 2 Layers SBS

Data Visualization, DAX, Excel, Microsoft Technologies, Power Pivot

Power Pivot: Dynamically Identifying Outliers with DAX

I have been working on a project in which we were looking at durations as an indicator of service levels and customer satisfaction, specifically the maximum duration and average duration. We built a small dimensional data mart in SQL Server to house our data, then pulled it in to Excel using Power Pivot so we could make a few Excel and Power View reports and provide the client with the model for further ad hoc reporting and analysis. After the first iteration of development, we got feedback from the client that there were several outliers that were skewing the metrics. They wanted to see average and max duration for whatever group of objects they had chosen and then see the calculations after removing the top and bottom 5% of values.

Here is a small sample of my fact table, called Object Processing.  Field names and some small details have been changed to protect the innocent.

ObjectKey StatusKey CreationDateKey GeographyKey Duration
1 1 20140501 1 2193218
2 1 20140503 2 2114354
3 2 20140429 3

The first four columns are foreign keys to dimensions. The object dimension contains an object ID, name, owner, and content group. The status dimension contains an ID and description of the current processing status. The date dimension is a standard date dimension representing the date the object was created. The geography dimension contains an ID, country, and region. The duration column represents the number of seconds the object has spent processing. A new row is inserted into the fact table for each object created.

First, I have a few simple DAX calculations. These calculations work great on their own until you need to remove the outliers:

Total duration:= Sum('Object Processing'[Duration])
Avg duration:= Average('Object Processing'[Duration])
Max duration:= Max('Object Processing'[Duration])

To remove the outliers we need to rank the objects by duration. I first read Alberto Ferrari’s blog post on dynamic ranking in Power Pivot. It demonstrates a method to rank items within the table as a calculated column, which didn’t quite work for me because I wanted to dynamically rank items after filters were applied (ex: rank only the objects with a create date of yesterday and a status of completed). Then I found a series of posts on PowerPivotPro.com about Dynamic TopN Reports Via Slicers (parts 1, 2, 3, 4). That was what I needed to work through my similar problem.

First, I created a calculated measure for ranking the objects. If an object had no duration (because it had been created but had not yet reported a duration), I wanted it to be blank rather than rank as the lowest duration. In this formula, I’m ranking Object IDs in descending order based upon my total duration calculation.

Object Duration Rank:=
           if (ISBLANK([Total Duration]), BLANK(),
           RANKX(ALLSELECTED(Object[Object ID]), [Total Duration], ,0))

Next, I needed to identify the top and bottom 5%. First, I counted the total number of objects excluding null durations:

Object Count For Outliers:=
               CALCULATE(DISTINCTCOUNT('Object Processing'[ObjectKey]),
               ALLSELECTED('Object Processing'),
               NOT(ISBLANK('Object Processing'[Duration])))

Then I calculated the number of objects in the top 5%, giving me the rank of the last member of my top outliers group:

Outlier Count:=round([Object Count For Outliers]*.05,0)

To get the rank of the first member of the bottom outliers group I calculated:

Low Outlier Rank:=[Object Count For Outliers]-[Outlier Count]

Finally, I wrote a calculation that determines if the object is in range (not an outlier):

Is In Range:=
               if(ISBLANK([Total Duration]), BLANK(),
               if(Object Duration Rank]>[Outlier Count] && 
               [Object Duration Rank]<[Low Outlier Rank], 1, 0))

Now I could filter my pivot table containing the Is In Range calculation for only those values equal to 1 to see the average and max duration with outliers removed.

Ultimately, I produced a summary report similar to this:Object Duration SummaryWe were able to compare object durations for objects created in the last 1, 7, and  30 days, including and excluding outliers. The conditional highlighting is set so that cells are highlighted if the duration for that time period is greater than the duration for the prior time period listed below it. For example, average object duration for yesterday is 5:06:02, which is greater than the average duration for the last 7 days of 4:54:14.

I’m still working on a DAX calculated measure that will return the average object duration excluding outliers in one fell swoop, rather than needing to filter out the outliers in a pivot table.

Excel, Microsoft Technologies, Power BI, Power Pivot, Power Query

Moving Calculated Measures in Power Pivot for Excel 2013

I learned a lesson the hard way: I shouldn’t change field names and data types in Power Pivot on tables that were imported using Power Query. My changes broke the connection between the two tools, so when I refreshed a query in Power Query that was set to load the results to my data model it caused a new/duplicate table to be created in my data model instead of updating the original table.

PPBlogTables

I already had Power View reports built off of my model that I didn’t want to rebuild, and I didn’t want to leave a duplicate table in my model, but I needed to make the data model refreshable. The issue can be fixed, but it’s a bit tedious:

  1. Create relationships for the new table to match those of the old table
  2. Rename the old table to something else
  3. Rename the new table to the original table name (usually it has a 1 after the table name)
  4. Rebuild or move any calculated measures to the new table
  5. Delete the old table

I was fine until I got to #4. You can’t just copy a calculated measure to a new table by cutting and pasting. You can copy the formula from the formula bar and paste it into a cell on another table, but you will get a warning that you can’t have two measures with the same name.

PPBLogDupName

Going this route means pasting your measure, renaming the old measure, and then renaming the new copy of the measure to the original name.  Due to the lack of the ability to filter data in Power Map, I had to create several calculated measures in my data model to limit the data shown in my map.  The idea of copying, pasting, and renaming several dozen measures was less than appealing.

It should be noted that in the current version of Power Pivot, you can put calculated measures on any table in the model without affecting the result of a query using the measure. The table in which it is located is more of an organization issue rather than a technical calculation issue.

I thought there must be a better way to move a measure. I looked through all of the buttons on the ribbon in the Power Pivot window and saw nothing helpful so I turned to Google (yes, I still choose Google over Bing). A TechNet article provided just the guidance I needed:

“To move a measure, you can cut and paste the formula in the Calculation Area. You can only move it to a different location in the Calculation Area of the table in which it was created. You cannot move a measure to different table in this view; instead use the Measure Settings dialog in the PowerPivot ribbon in Excel to change the association.”

The Measure Settings dialog box held the answer, although I found these settings under the Calculated Fields -> Managed Calculated Fields.  PPBlogManageFields

It contains a drop-down box that allows me to choose the table where the definition of the measure should be stored. This was a much quicker way to quickly move several calculated measures to another table in Power Pivot.

PPBlogcalcFieldSet

I have built several Power Pivot models and never run into this particular situation. I can’t say I’m glad it happened, but I can appreciate that it reminded me that there are useful things on the Power Pivot ribbon in the Excel window that aren’t accessible through the Power Pivot window.

Data Visualization, Excel, Microsoft Technologies, Power BI, Power Pivot, Power View

Infographic vs Power View

Someone on Twitter posted a link to an infographic on the 10 most visited cities in the world, which you can see below.  I’m interested in travel, and I like data, so I looked through it.

Most Visited Cities Infographic
Most Visited Cities Infographic from DailyInfographic.com

After a second or two of looking at it, my BI and dataviz nerdiness kicked in. Here were my thoughts:

  1. I know infographics are cool, and I love that they communicate an interesting data point to the general public in an easily consumable manner.  But I hate when they make them really large/long with a lot of text where you have to read through every bit of text to really get the info.  You lose a lot of the opportunities for preattentive processing with a format like this.
  2. I know I could make a great Power View from this data that could deliver information more efficiently, and I think it would look pretty cool.

Making the Power View

First, I had to get the data and put it in a Power Pivot model. I just copied the info from the infographic and put it in a single table in Excel.  You could split it out and put it into several tables (especially if you split the tourist attractions into 3 individual fields per city), but I didn’t need it for what I was making and the data was all at the same level of granularity in my single table.

Most Visited Excel table
Click to enlarge

I imported that into my Power Pivot model by selecting the table and  clicking the Add to Data Model button on the PowerPivot tab.  I had a vision of where I was going with my Power View, so I also imported a table of countries and flag images into Power Pivot. See yesterday’s post for info on how to import images into Power Pivot. I had originally split out the hotel price and the hotel name as two separate fields, but I found that I liked them together when I was building my Power View so I made a calculated field in Power Pivot to take care of that.  Here are my tables in Power Pivot:

Most visited power pivot 1
Click to enlarge
Most visited Power Pivot 2
Click to enlarge

You’ll notice on the Country Flags table, I had to change the table behavior properties to get the images to render properly in Power View. You can find this on the Advanced tab in Power Pivot. The important part is to set the default image to the flag image. Next I had to relate my most visited cities data to the appropriate flag.  Since there is one row per country in my Country Flags table, this was as easy as relating the country field in the Most Visited Cities table to the country field in the Country Flags table.

Most Visited Power Pivot 3
Click to enlarge

And finally, I built my report. You can download my workbook here.

Most Visited Power View
Click to enlarge

I used a map and cards to display the data. In the map, the bubble size represents number of visits. If you click on a bubble you filter the table below to see just the information for that location. If you hover over a bubble you can see the city name and visits per year. The table is sorted by visitors per year descending.

Why I Like This Power View

Let’s try an experiment: open the file in Excel and then open the infographic in your browser. See how long it takes you to answer the following questions with the infographic and then with the Power View.

  1. How many of the most visited cities are in the United States?
  2. If I were to plan a trip to visit 3 of the cities on this list, which ones would be most convenient to visit in one trip?
  3. What tourist attractions are in Hong Kong?
  4. How many visitors does Dubai get per year?
  5. In what country is Kuala Lumpur located? (I’m hoping you know this off the top of your head, but let’s pretend you need to look it up.)
  6. Which country has the lowest average hotel rate?

I think the Power View would answer all of these questions quicker or just as quick as the infographic.  If you have been looking at the infographic for a while, you might be pretty fast at #3, #4, and #5.  But you will be much slower at #1, #2, and #6 using the infographic.

There are two main reasons for this:

  1. The map provides visual information that you can take in quickly.  You can quickly see locations of these cities on the globe as well as their locations relative to each other. Because of preattentive processing (PDF) we can determine which places have the most visitors (size), which places are close to one another (spatial position), and where cities are located (color – the mauve bubbles stand out) without having to really think about it. I will note that both the Power View and the infographic do a good job of using enclosure to indicate which data points are related.  The Power View has the mauve bars to the left and the line above to separate each city.  I easily know that Malaysia is the country related to Kuala Lumpur and don’t confuse it with the information for Hong Kong above it.
  2. Interactivity allows us to quickly find information through sorting and filtering. For question #3, I just clicked on the bubble for Hong Kong to filter the table below and find that the tourist attractions in Hong Kong include Hong Kong Disneyland, Kowloon Walled City, and Ocean park Hong Kong. To find the lowest hotel rate, I just sorted my table by the Avg Cheap Hotel field ascending and looked in the first card to find Kuala Lumpur has hotels for $16. If you look for this information in the infographic you have to scan through every box until you find Hong Kong or look through all the boxes and remember the lowest hotel rate you have read.

Perhaps you think that it is unfair to compare an interactive Power View with a static infographic. I would argue that it makes a point.  When you build BI solutions including reports and ad hoc data discovery tools you should consider the needs of the end consumer to decide whether you need interactivity. They do make interactive infographics.  This author chose not to include that feature in her design. There are some advantages to the infographic around accessibility. The infographic can render in basically any browser on any device and can be downloaded as an image to where it doesn’t require internet connectivity. Power View maps currently use Silverlight and the maps require an internet connection. The preview of Power BI in Office 365 has an HTML view but it doesn’t yet work with maps, so you aren’t going to look at this Power View on an Android or iOS device.

I don’t think all infographics are inherently poorly designed; I don’t think this one is particularly bad.  I do think that the information could be better arranged to more effectively communicate to the audience. I recommend looking through Stephen Few’s presentation on Infographics and the Brain (PDF) for more info on this subject. I agree with Few that infographics are good when they simplify the data and tell a story to communicate a message with the data in addition to being entertaining.

Excel, Microsoft Technologies, Power BI, Power Pivot, Power View, SSAS, T-SQL

Fun With OPENROWSET

I’ve had several occasions to use OPENROWSET recently in T-SQL.  Although you can use it as an ad hoc alternative to a linked server to query a relational database, I’m finding it useful to get data from other sources. I’ll go into details of how to use it, but first I would like to acknowledge: OPENROWSET is not the answer to everything.  Sometimes you should use a linked server or SSIS or some other method.  I had some specific cases where I felt OPENROWSET made more sense for my task.

Importing Data from an Excel Spreadsheet

Technical need: I had some data that needed to be validated before importing into SQL Server from Excel as part of an upload process in an application.  If it didn’t pass validation, I did not want to import the data. Rather than use SSIS to import the data into a table and then check it, I used OPENROWSET to pull it directly from the Excel file into a temp table, ran all of my validation checks, and then imported the data using SSIS all within a stored procedure that was called by an application.  Field and table names have been changed to protect the innocent.

The stored procedure:

CREATE PROCEDURE dbo.MySproc
@datafile nvarchar(127) 
AS
BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;
    DECLARE
        @sql nvarchar(max)
    ,   @rowcount int
 SELECT
        @rowcount = 0

    DECLARE @FileExists int

    BEGIN TRY
        EXECUTE master.dbo.xp_fileExist @datafile, @FileExists out
        -- Value of 1 = file exists
        -- Value of 0 = file doesn't
        IF @FileExists = 0
        BEGIN
            SELECT
            CONCAT('The file ', @datafile, ' could not be found.') 
                                                   AS reasontext
        ,   NULL AS field2
        ,   NULL AS field3;            
            RETURN -1;
        END
    END TRY
    BEGIN CATCH
        SELECT
            ERROR_MESSAGE()  AS reasontext
        ,   NULL AS field2
        ,   NULL AS field3;
        RETURN -1;

    END CATCH
 TRUNCATE TABLE dbo.finaldest;
    IF OBJECT_ID('tempdb..##tempf') IS NOT NULL
    BEGIN
        DROP TABLE ##tempf;
    END;

--Create temp table to hold values from Excel file
Create table ##tempf (
Field1 [nvarchar](255),
Field2 [nvarchar](255),
Field3 [nvarchar](255),
Field4 [nvarchar](255),
Field5 [nvarchar](255),
Field6 [nvarchar](255)
);
--dynamic sql to populate temp table with values from Excel file
    SELECT
    @sql = N'INSERT INTO ##tempf
   SELECT
   [Field1],
   [Field2], 
   [Field3],
   [Field4],
   [Field5],
   [Field6],
    FROM OPENROWSET (
                ''Microsoft.ACE.OLEDB.12.0'',''Excel 12.0;Database='
                + @datafile
                + ';HDR=YES;IMEX=1'', 
                ''SELECT s.* FROM [Sheet1$] AS S 
                WHERE [Field2] IS NOT NULL AND [Field3] IS NOT NULL'')';

    BEGIN TRY 
        EXEC sp_executesql @sql;
    END TRY
    BEGIN CATCH
        SELECT
            'File not found or invalid file type' AS reasontext
            --ERROR_MESSAGE()  AS reasontext
        ,   NULL AS Field2
        ,   NULL AS Field3;
 IF OBJECT_ID('tempdb..##tempf', 'U') IS NOT NULL 
        BEGIN
            DROP TABLE ##tempf
        END
      RETURN -1
    END CATCH
/*Do lots of data validation here
...
*/
--if data validation tests are passed, write to final destination table
Insert INTO dbo.[finaldest]
   ([Field1],
      [Field2], 
      [Field3],
      [Field4],
      [Field5],
      [Field6])
   Select [Field1],
      [Field2], 
      [Field3],
      [Field4],
      [Field5],
      [Field6] 
      from ##tempf;

 Drop table ##tempf; 
END

To make this work you need to install the Microsoft ACE OLE DB 12.0 provider on your server. You cannot use the Jet OLEDB Provider with 64-bit SQL Server and 64-bit Excel. Once you have this installed you need to change some settings:

EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', 
N'AllowInProcess', 1
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', 
N'DynamicParameters', 1
GO
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'Ad Hoc Distributed Queries', 1;
GO
RECONFIGURE;
GO

Here is a blog post that explains the errors you might get if you don’t have these settings correct. It also notes that you can go check your settings in the registry keys. For SQL Server 2012 the path is HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> MSSQL11.MSSQLSERVER -> Providers -> Microsoft.ACE.OLEDB.12.0.  You can then see that AllowInProcess and Dynamic parameters are set to 1.

RegKeyACEOLEDB

This provider seems to be a bit fussy and there are some things that took me a while to troubleshoot once I got started.  First, make sure that the location of the Excel file you are reading is accessible to the account that is running the SQL engine and doesn’t require admin access. The Linked Server errors that get returned with this really aren’t that helpful, so you have to use your Google-fu to work your way through it.  I used a global temporary table so it would be accessible across connections if it needed it to be (I was having trouble when I used a local temp table).  The application that calls this stored procedure queues up the calls and executes them serially so one can’t override the other.

Returning results from an Analysis Services cube in SQL

Technical need: I had some data (several facts with common dimensions) with calculations in a multidimensional cube and an application that needed that data but couldn’t/wouldn’t consume the MDX result set.

The query: For demonstration purposes I’m using a small query that returns the values for a drop-down box for selecting a member of the date dimension.

select a."[Measures].[DateKeys]" as DateKey, 
a."[Measures].[DateLevel]" as DateLevel, 
a."[Measures].[DateValues]" as DateValue 
from OPENROWSET 
('MSOLAP','Datasource=localhost; Initial Catalog=MyCubeDB;',
  
'with member [Measures].[DateValues] as
    Iif([Date].[Fiscal Calendar].currentmember.level_number = 2, "    " 
    + [Date].[Fiscal Calendar].currentmember.member_value, 
    [Date].[Fiscal Calendar].currentmember.member_value)
  member [Measures].[DateKeys] as 
    [Date].[Fiscal Calendar].currentmember.uniquename
  member [Measures].[DateLevel] as 
    [Date].[Fiscal Calendar].currentmember.level_number

select
 {[Measures].[DateValues], [Measures].[DateKeys], 
                           [Measures].[DateLevel]} on 0 ,
 {Exists(Descendants([Date].[Fiscal Calendar].[FY2014], 1, 
                         SELF_AND_BEFORE), , "MyMeasureGroup") } on 1
 from [MyCube]
') as a;

You can see that I explicitly named my fields, but you can also do a select * to get all the fields from the OPENROWSET. I had to put the fields from the cube in quotation marks to make it work because of the brackets in the field names. For Analysis Services, MSOLAP is the name of my provider.  You must have the correct version of this provider listed under the linked servers on the server where this is being executed.

MSOLAP ProviderFor SQL Server 2012 SP1, you can download the Microsoft Analysis Services OLE DB Provider for Microsoft SQL Server 2012 SP1 here. There is nothing special about the data source and initial catalog. They are exactly the same connection info you would use when connecting to SSAS through Management Studio.  The last part is simply the MDX query.  You should be able to copy that part out and run it in Management Studio connected to your SSAS database and get results.

Inserting Images Into a Database

Technical need: I was building a demo for Power BI and I needed to show some images in my Power View reports in Office 365.  In order for the images to show in Office 365, you must import the images into SQL Server and then import them into Power Pivot. First I downloaded images of flags for each country from the CIA world factbook.  I took the folder of flags and put it in my pictures folder. I already had a table in SQL Server that contained country names and codes.  So I just needed add a flag for each country.

The query:

  CREATE TABLE [dbo].[countryflagsdemo]
  (
  [id] int, 
  [country] varchar(100), 
   varchar(2),
  flagimage [varbinary](max)
  )
 --get ID, country, and code from existing table
  INSERT INTO dbo.countryflagsdemo (id, country, code)
  SELECT id, country, code FROM dbo.Countryflag1;

  DECLARE @i INT; 
  DECLARE @rows INT; 
  DECLARE @code VARCHAR(2);
  SET @i = 1;
--get max id value for loop
  SET @rows = (SELECT COUNT(country) FROM dbo.countryflagsdemo)
  WHILE (@i < @rows)
  BEGIN 

    DECLARE @sql VARCHAR(MAX); 
    SET @code = (SELECT code FROM dbo.countryflagsdemo WHERE [id] = @i);
    SELECT @sql = 'UPDATE dbo.countryflagsdemo 
      SET flagimage = 
      (SELECT BulkColumn FROM 
        OPENROWSET (BULK 
            ''C:\Users\mlongoria\Pictures\flags\large\' + @code 
               + '-lgflag.gif'', Single_Blob) as flagimage)
            WHERE  = ''' + @code + ''';'

    EXEC (@sql);
    SET @i = @i + 1; 
    --PRINT @code;

END 
GO

I figured out how to do this based upon this MSDN blog post. I inserted my other data first and then added the images.  I’m sure you could find a way to do this all in one select statement.  Since this was a one-time thing I didn’t see the need to find another way. It should be noted that the bulk insert must pull from a location on your computer; you can’t use links to the images on the internet. I downloaded the images in a folder. Each image name was the abbreviation for the country. If you are going to import this data into PowerPivot, be sure you change the table behavior properties so it will recognize the varbinary field as an image.

 

There may be other ways to accomplish my goals, but it’s nice to understand how to use OPENROWSET and have that in my toolbox.