Category: Data Visualization

Free Data Viz Webinar May 17

Data visualization remains an important topic in analytics today, especially with the growth of big data and self-service BI. People with all kinds of roles and responsibilities need to communicate with data in the workplace, but most people don’t have the training to do so effectively.

The brilliant Jason Thomas and I are leading the May webinar for BlueGranite on Communicating With Data Visualization.  The webinar will cover:

  • Understanding data visualization as a form of communication
  • The process of creating a good data visualization
  • Using cognitive psychology to optimize your data visualization
  • Tips to ensure success of data visualizations projects

There will be lots of great tips that we’ve learned from leaders in data visualization such as Storytelling with DataStephen Few, and Andy Kirk. We’ll show several examples built with Power BI, Reporting Services, and Tableau to demonstrate how we apply data viz design concepts. And we’ll share what we have learned from our experiences building and deploying data visualizations as part of BI solutions.

Join us Tuesday, May 17, 2016 from 1pm to  2pm EDT.  Register here.

Colorblind Awareness and Power BI KPIs

Update: The ability to change the color of a KPI was delivered in August 2016!

Color blindness, or color vision deficiency (CVD) affects 1 in 12 men and 1 in 200 women in the world. The chances are good that you have met someone who is colorblind, but you may not have realized it. I know at least two people in the #sqlfamily who are colorblind. There are many types of CVD, but the most common is Deuteranomaly (red/green color blindness).  Those with deuteranomoly  have trouble distinguishing shades of red and green as well as some shades of gray, purple, and blueish green.

And which colors do we love to use with KPIs? Red and green, of course! Color is a very powerful tool in data viz. We use it to indicate meaning and to draw attention to something important. KPI boxes are used to display key metrics in an efficient manner. These key metrics are usually rather important, and our users need to be able to see their status at a glance.

I quite like the design of the KPI boxes in Power BI, but for some reason they were created without the ability to adjust the color associated with the state (good/bad). Shown below, they use the common red/green color scheme.

PBI KPI

This may look fine to you, but if you are a deuteranope (green-blind), it looks like this:

Deuteranope PBI KPI

That makes it much more difficult to tell how things are going. I would have a hard time getting the same quick information from the KPI boxes as a deuteranope. There are some visual clues in the KPI box to help those with CVD: there is an indicator (! for bad and ✓ for good) as well as the percent above or below the goal shown under the value. But both of those elements are rather small, and I have found no way to increase their size within the KPI box. For those without CVD, the color shown immediately communicates the state without me having to concentrate on the text within the KPI box. Color is a preattentive attribute. I notice it without conscious effort.  But I need to be conscious of my audience’s ability to do the same with my color choices.

The video below shows what Power BI KPIs look like through various types of CVD.

 

If you work at a small company where you know no one is colorblind, or if you use Power BI for personal projects only, you may not care. But if you are an organization that makes data viz for the general public, I encourage you to be aware of what your audience sees and adjust your color choices so you are effectively communicating to your entire audience.

Blue and orange is a recommended palette to avoid common CVD issues. Unfortunately, I can’t change the Power BI KPIs to blue and orange, nor can I change the intensity of the hues to make them more easily distinguishable. If you agree that this functionality should be available in Power BI KPI boxes and would like to make your voice heard, please vote for my suggestion at:

https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/13545780-ability-to-change-kpi-colors-to-accomodate-colorbl

Microsoft watches the suggestions on that site and has already implemented several of them once they were shown to be popular. Please help me out and spare a vote or two for the color vision deficient among us.

Update (25 Apr 2016): Microsoft saw my request and has already committed to getting this functionality into an upcoming release! You can see Miljan Braticevic’s comment below, and the idea is already marked as planned at the link above. That was a quick response!

KCStat Chart Makeover

I live in Kansas City, and I like to be aware of local events and government affairs. KC has a program called KCStat, which monitors the city’s progress toward its 5-year city-wide business plan. As part of this program, the Mayor and City Manager moderate a KCStat meeting each month, and the conversation and data are live tweeted.  I love the program as well as all of the open data available on Data.KCMO.org. But as a data viz enthusiast, I cringe at some of the charts that are tweeted out from KCStat.  I wish that Kansas City would let me fix their charts – or better yet, let me come and teach a data viz class and facilitate a chart makeover session (if anyone from the city of KCMO is reading this, I would gladly donate my time to do this).

This week, a particular chart from KCStat caught my eye.

Story

There is an interesting back story to this chart. KCMO invited citizens to attend Citizen Work Sessions, in which they organized into small groups and played a game to learn about the city budget and provide feedback. In the first scenario of the game, each group was asked to allocate a $100 budget across the city’s five major goals (as laid out in KC’s Citywide Business Plan) without reference to specific programs within the goals. In the second scenario, they were given a total budget of $1000 allocated to match the current city budget and were asked to identify the top 3 – 5 priority programs under each goal and then cut $50 from individual programs (as opposed to just the broad goals).  The chart from the tweet represents the results of the game.

Chart Design

The table shows the starting allocation and the average final allocation after the $50 cut (averaged because multiple groups performed this exercise). The line in the chart is tied to the starting allocation, and there are several markers that show the values assigned by the individual groups  after the budget cut (which would equal the ending final average amount when averaged together).

For me, this chart is ineffective and cluttered.

  • It’s using a line chart for categorical comparisons. With rare exceptions for multivariate analysis, slope charts, and maybe a few other situations, line charts are best used to show trends over time.
  • The markers are all bunched together to the point that they really aren’t useful, especially given the scale of the y-axis and the size of the markers. It would be nice to see the individual markers if there were any dramatic outliers, but that is not the case with this data.
  • The placement of the table in the middle of the chart between the axis labels and the data points is distracting. If you include axis labels rather than directly labeling, it is because you want to allow users to see the scale and follow a label across to a data point. But the line your eyes follow with this chart is interrupted by the big blue table in the middle of the graph.
  • I believe the table column header for “Ending Average Final (after cutting $50m)” is mislabeled. It should be after cutting $50, since we are dealing with only $1000 to begin with. I thought maybe it was accidentally scaled to the city’s real budget amount, but it looks like the Governmental Activities budget is about $953 million so that didn’t make sense.  Either way, it confused me and then I decided it was mistyped. Especially with this kind of subject matter (budgets and government communication to the public), you want to be as clear as possible and review your charts to remove any possible confusion.
  • The data in the chart is the least emphasized thing in the image. The thing that is most eye-catching is the header row of the table or perhaps the graph title due to the use of bold colors.
  • Goal areas are abbreviated in the table and wrapped on multiple lines in the graph x-axis.
  • This chart could really use some context. I had to dig through the presentation and the city’s open data to understand the situation that explains this chart. A few sentences to go with it would have been nice so readers know what they are looking at.

So I decided to take this image and redo it with better design decisions. I ended up with two different versions. I used Excel to create these makeovers. I’m sure I’ll do future makeovers with Power BI or Tableau, but I think Excel worked just fine for this, and is probably what was used to make the original data viz.

Version 1

KCStat Chart Makeover Version 1 (click to enlarge)

First, I added some explanatory text for context to help communicate my message. If you don’t know about the Citizen Work Sessions, this chart might be confusing. The note about the shift in spending to public safety came from the FY 2016 – 2021 Citywide Business Plan. I thought it was important context to note as city council members and government officials have hard decisions to make when it comes to the budget.

So my data viz communicates two things:

  • explains the allocation of budget across goal areas
  • notes that there was a budget cut and shows the impact on the budget and allocation

You’ll notice in this version I used data bars embedded in my table rather than embedding a table inside of my chart.  Microcharts can be a good way to visually represent data while still providing detailed numbers. While tables primarily interact with our verbal system, graphs interact with our visual system to give us a quick picture of how different data points are related. The data bars form a horizontal bar chart, which is a great way to show a categorical comparison. Here I’m comparing the initial budget amount allocated for each goal area. My rows are ordered by descending value of the budget amounts.

Also, because I now have my labels on rows, I have plenty of room to type the full name of the goal area without abbreviating or wrapping.

I added the percent cut data because after seeing the numbers, I immediately start doing the math of (499 – 515)/515 to understand the difference. So I just saved everyone some time and showed it in my table. The absolute numbers are definitely useful, but it’s also important to understand the impact of the cut to each goal. A 16% cut could be devastating to an already small budget. I used conditional highlighting to help emphasize the impact of the cut, so the largest cut has the most intense color.

KCMO didn’t provide the individual data points for the markers, so I couldn’t use the individual data points in my chart makeover. But I think I might have chosen not to show the individual data points anyway because the value that they add is not worth the space they take up in the visualization. And they’re not core to my message. What I’m trying to show is where people generally made cuts when faced with that decision. I don’t have to show all the data points just because I have them.

The strategic use of color draws your eyes to the important parts of the data (the allocation to goals and the effect of the budget cut) rather than the box around the chart tile or the table column header like it did in the original visualization.

Version 2

KCStat Chart Makeover Version 2 (click to enlarge)

I used the same explanatory text in this version. This time, I created a clustered horizontal bar chart and directly labeled the bars, removing the need to show a table. This chart shows me two things: the allocation of the budgets across goals and the and the difference between the original budget and the average budget after cuts. This time I’m not focusing on the impact of the cut to the total budget for the goal, just visually showing where cuts happened and allowing the reader to make the comparison with the bars.

Again,  I switched to a horizontal bar char so I have a good categorical comparison and my labels have plenty of room and don’t have to be wrapped or abbreviated. I put the legend at the top left so you have that information before going to read the chart (our eyes tend to move across the chart in a “z”).

And again your eyes are drawn toward the color first, which is where the important data is located.

Data Visualization Critiques & Makeovers

Feel free to leave comments on how you would improve my chart makeovers. I would appreciate the feedback.

If you like chart makeovers, there are several sites that critique and redesign data visualizations. Here are some of my favorites:

 

Storytelling with Tableau

In addition to writing here on my personal blog, I also occasionally blog for BlueGranite. I contributed this week’s Demo Day blog article and video on Improving Data Viz Effectiveness. I’ll let you check them out on the BlueGranite site, but I wanted to point out a great feature of Tableau that I’m learning to appreciate: Stories. We often talk about story-telling with data in the analytics community, and Tableau has made it very easy to implement this concept.

In Tableau, a story is a sheet in which you can embed and arrange other sheets and dashboards. Each individual sheet/dashboard embedded in a story is called a story point.

Stories retain the interactivity built into the included sheets. This allows me to provide some guided navigation to the consumers of my viz, but also gives them some control through the interactivity of filters and selections.

Data Viz demo day image
Tableau story with navigator

As Robert Kosara pointed out when the story feature was first released, stories can be a superior alternative to exporting images or grabbing screenshots to present your data viz because you can answer questions with your connected data as you present it. I agree, assuming you can maintain connectivity to your data sources and Tableau (if using Tableau Server).

The entirety of my Demo Day video is a Tableau story in presentation (full screen) mode. I didn’t need to supplement with PowerPoint slides because I could add story points with text and images in Tableau. It feels more natural to add a paragraph of text to a sheet in Tableau than it does in other data viz tools. I think that encourages data viz designers to provide more (and many times much needed!) context to our data viz.

If you are trying out Tableau story points for the first time, Matt Francis has some good tips on his blog.  For some inspiration, check out my favorite example of story points: The Simpsons Vizipedia.

Color Coding Values in Power View Maps Based Upon Positive/Negative Sign

Power View can be a good tool for interactive data visualization and data discovery, but it has a few limitations with its mapping capabilities.

  • Power View only visualizes data on maps using bubbles/pies.
  • The size of the bubble on a Power View map can be misleading during analysis for data sets with small values and negative values.
  • By default, Power View can only display one measure on a map, which it uses to determine the size of the bubble.

Microsoft offers other mapping capabilities in Power Map and SSRS maps, so if all you need is a map, you can explore those other tools to determine if they better meet your needs. Check out my presentation on Choosing Your Geospatial Mapping Tool for more info on requirements, capabilities, and limitations of maps in Power View, Power Maps, and SSRS. If you want to see a map in the context of other charts, and you want the interactive highlighting and filtering of Power View, you may look for workarounds to the second and third issues listed above.

I recently had a client express disappointment about the display of negative values in Power View maps. A negative value simply gets a small bubble, and positive value gets a larger bubble. You have to hover over the bubbles to see each value and understand the bubble size that represents the change in sign from positive to negative. This gets a little tedious and minimizes the usefulness of the bubble map in the first place. The example Power View map below shows the year over year change in trespassing incidents in my city from 2013 to 2014. The small bubble has a value of -16, and the large bubble to the right has a value of 6.

PV Map Neg Values

One way to better present this type of data in a Power View map is to have positive values represented in one color and negative values represented in another with the bubble sizes showing the absolute value of the change. This proved to be less than straight forward as I worked my way through it. You can’t just make separate measures for the positive and negative values because Power View won’t let you put two measures on a map in that manner. You can’t simply create a calculated measure that determines the sign of your values and put it in the legend because Power View is expecting categorical data for the legend and it won’t let you put the numerical data field in the legend. Power View also doesn’t recognize calculated measures that return a text value (they don’t show up in the field list), so you can’t create a measure that turns the sign into a text value (“Positive” or “Negative”) in an effort to make it more categorical.

But you can use dynamic segmentation to show the second measure as a dimension. There is a great video on MSBI Academy that walks you through this. Although the video shows two different measures on a map, you can use this method to color code positive and negative values of a single measure. First, create a disconnected table for the number sign attributes. It should contain at least:

  • a text field with a descriptor
  • a field for the minimum value of the range
  • a field for the maximum value of the range.

The video shows storing that table in Excel (which is great if you plan to make changes), but you can now also paste the table directly into Power Pivot without any link to the table in Excel. My final table looks like the image below.

PV Map Sign Table

Next you need a calculation that uses the disconnected table and displays the absolute value of the measure you want to use for the bubble size.

Change In Incidents YOY Pos/Neg:=
     if(CALCULATE(Countrows('Sign'),
     FILTER('Sign',[Change in Incidents YOY] >= Sign[Min] 
     && [Change in Incidents YOY] < Sign[Max])
     ),
     abs([Change in Incidents YOY])
     )

Then you can build your map.  Use the Pos/Neg field for color, the new calculation for size, and your location fields of choice.

PV Map Pos Neg 1 Layer

Note that you will now see a warning message in the Power View fields list that relationships may be needed. You can disregard that. The use of a disconnected table is the cause of that message, and everything is working as intended.

PV Rel Msg

If your map has no drill down levels, you can stop here. Your map will now have the values for positive and negative in the legend with bubbles of two colors on the map. A bubble whose signed value is 5 will be the same size as a bubble whose signed value is -5 with the color indicating the sign of the number. In this analysis, a positive number indicates an increase in trespassing, so I chose a theme and ordered my values in the sign table such that the positive values were assigned the brighter color. These are the bubbles that need attention since trespassing is rising in the related zip codes.

If you do have drill down levels in your map, you have a couple more steps. You need another measure that deals with the drill down path. Without this extra step you end up with pies and unexpected values.

PV Map Pos Neg 2 Layer Bad

My map allows the user to drill down from zip code to beat. My new calculation references the top level of my drill down path ([Zip Code]), the calculation I just created ([Change In Incidents YOY Pos/Neg]), and the second level of the drill down path in my map ([Beat]). Note that this calculation is specific to the drill down path used in the map. If you have several maps with different drill down paths, you will need to create several calculations for this to work.

Change In Incidents YOY Pt 2:=
     IF(COUNTROWS(ALLSELECTED('KCPD Crime Data 2014'[Zip Code]))=1, 
     [Change In Incidents YOY Pos/Neg], 
     CALCULATE([Change In Incidents YOY Pos/Neg], 
     ALLSELECTED('KCPD Crime Data 2014'[Beat]))
     )

Now I can put this new measure in place of the original one, and the pies will go away, returning the expected color coded bubbles and values.

PV Map Pos Neg 2 Layers SBS

I’m speaking at PASS Summit 2014

PASS Summit 2014 Speaking Badge_250x250I’m pleased and honored to say that I will be speaking at PASS Summit 2014 November 4 – 7 in Seattle. I will be presenting one general session and one lightning session.  I’m excited to be an attendee and a speaker at the world’s largest gathering of SQL Server and BI professionals. I hope you can attend my sessions, described below.

Choose Your Mapping Adventure: Visualizing Your Geospatial Data

BI Information Delivery, general session

You have geospatial data and you know that plotting it on a map will help deliver insights. Or maybe you just keep hearing “geospatial” as one of the current BI buzzwords and you want to be informed about your options when you get to it. The Microsoft BI stack provides several options for visualizing location data. But, how do you know which tool will best serve your purpose, or what factors are even important in making this decision?

In this session, we’ll identify factors to consider when deciding which tool to use to visualize your data in maps such as system/software requirements, internet connectivity requirements, availability and type of location data, required level of technical understanding, and more.

We’ll also explore the requirements and features of maps in Power Map, Power View, Reporting Services, and the Bing Maps add-in for Excel. You’ll leave this session knowing your options for creating maps and confident in your ability to choose the best path/tool within Excel and SQL Server.

Why Data Visualization Matters for All Data Professionals

BI Information Delivery, lightning talk

As a data professional, there is a good chance that you will have to visually represent data for coworkers, clients, or even just yourself. Understanding how our eyes and brains interpret visual information can help you present data in a way that is both effective in communicating your message and visually appealing.

In this session, we’ll look at some neuropsychology concepts and data visualization best practices, then discuss some simple ways to employ them to help you professionally communicate a message with data. Whether you present database performance information, financial forecasts, or marketing campaign results, these tips will help you minimize distractions and impress busy colleagues with your data communication skills.

If you haven’t registered for PASS Summit yet, there’s still time. The cost of the full conference is $1895 if you register by September 26. Use the KC Server Server User Group discount code USNCS25 to get $150 off of registration and support our local user group.  There is also a discount code for the PASS Business Analytics Virtual Chapter for $150 off: VCSUM04. I hope to see you there!

 

Power Pivot: Dynamically Identifying Outliers with DAX

I have been working on a project in which we were looking at durations as an indicator of service levels and customer satisfaction, specifically the maximum duration and average duration. We built a small dimensional data mart in SQL Server to house our data, then pulled it in to Excel using Power Pivot so we could make a few Excel and Power View reports and provide the client with the model for further ad hoc reporting and analysis. After the first iteration of development, we got feedback from the client that there were several outliers that were skewing the metrics. They wanted to see average and max duration for whatever group of objects they had chosen and then see the calculations after removing the top and bottom 5% of values.

Here is a small sample of my fact table, called Object Processing.  Field names and some small details have been changed to protect the innocent.

ObjectKey StatusKey CreationDateKey GeographyKey Duration
1 1 20140501 1 2193218
2 1 20140503 2 2114354
3 2 20140429 3

The first four columns are foreign keys to dimensions. The object dimension contains an object ID, name, owner, and content group. The status dimension contains an ID and description of the current processing status. The date dimension is a standard date dimension representing the date the object was created. The geography dimension contains an ID, country, and region. The duration column represents the number of seconds the object has spent processing. A new row is inserted into the fact table for each object created.

First, I have a few simple DAX calculations. These calculations work great on their own until you need to remove the outliers:

Total duration:= Sum('Object Processing'[Duration])
Avg duration:= Average('Object Processing'[Duration])
Max duration:= Max('Object Processing'[Duration])

To remove the outliers we need to rank the objects by duration. I first read Alberto Ferrari’s blog post on dynamic ranking in Power Pivot. It demonstrates a method to rank items within the table as a calculated column, which didn’t quite work for me because I wanted to dynamically rank items after filters were applied (ex: rank only the objects with a create date of yesterday and a status of completed). Then I found a series of posts on PowerPivotPro.com about Dynamic TopN Reports Via Slicers (parts 1, 2, 3, 4). That was what I needed to work through my similar problem.

First, I created a calculated measure for ranking the objects. If an object had no duration (because it had been created but had not yet reported a duration), I wanted it to be blank rather than rank as the lowest duration. In this formula, I’m ranking Object IDs in descending order based upon my total duration calculation.

Object Duration Rank:=
           if (ISBLANK([Total Duration]), BLANK(),
           RANKX(ALLSELECTED(Object[Object ID]), [Total Duration], ,0))

Next, I needed to identify the top and bottom 5%. First, I counted the total number of objects excluding null durations:

Object Count For Outliers:=
               CALCULATE(DISTINCTCOUNT('Object Processing'[ObjectKey]),
               ALLSELECTED('Object Processing'),
               NOT(ISBLANK('Object Processing'[Duration])))

Then I calculated the number of objects in the top 5%, giving me the rank of the last member of my top outliers group:

Outlier Count:=round([Object Count For Outliers]*.05,0)

To get the rank of the first member of the bottom outliers group I calculated:

Low Outlier Rank:=[Object Count For Outliers]-[Outlier Count]

Finally, I wrote a calculation that determines if the object is in range (not an outlier):

Is In Range:=
               if(ISBLANK([Total Duration]), BLANK(),
               if(Object Duration Rank]>[Outlier Count] && 
               [Object Duration Rank]<[Low Outlier Rank], 1, 0))

Now I could filter my pivot table containing the Is In Range calculation for only those values equal to 1 to see the average and max duration with outliers removed.

Ultimately, I produced a summary report similar to this:Object Duration SummaryWe were able to compare object durations for objects created in the last 1, 7, and  30 days, including and excluding outliers. The conditional highlighting is set so that cells are highlighted if the duration for that time period is greater than the duration for the prior time period listed below it. For example, average object duration for yesterday is 5:06:02, which is greater than the average duration for the last 7 days of 4:54:14.

I’m still working on a DAX calculated measure that will return the average object duration excluding outliers in one fell swoop, rather than needing to filter out the outliers in a pivot table.