Data Visualization, Microsoft Technologies, Power BI

Check Out the Updated Violin Plot Power BI Custom Visual

I wrote about the violin plot custom visual by Daniel Marsh-Patrick back in February. I thought it was a good visual then, but version 1.3 has recently been released with some nice enhancements.

First, the violin plot is now a certified custom visual. This means that it has been tested by the Power BI team to ensure it meets certain requirements, one of which is that the visual does not access external services or resources. You can be confident your data isn’t being sent externally when you use the violin plot.

As for the functional enhancements, a new legend has been added. This is a great addition to make the chart clearer and more easily read, especially for audiences that may not be familiar with how the violin plot works. The customizable legend calls out what markers are used for mean, median, and quartiles.

Violin plot with the new legend

Another good enhancement is the new column option for the combo plot. It allows you to have your plot show as a range column chart where the bar spans from the minimum value to the maximum value for each category. I chose to show only the mean and median in the example below, but you can also add quartiles.

Violin plot using the new column plot type

The barcode plot also has a nice enhancement in the tooltip. Now when you hover over a bar, you can see the number of samples with the highlighted value.

You can check out Daniel’s blog post to see the full list of enhancements for this release. Tweet me if you make something cool and shareable with the violin plot in Power BI.

Data Visualization, Microsoft Technologies, Power BI

Turning a Corporate Color Palette into a Data Visualization Color Palette

Last week, I had a conversation on twitter about dealing with corporate color palettes that don’t work well for data visualization. Usually, this happens because corporate palettes are designed with websites and/or marketing collateral in mind rather than information graphic design. This often results in colors being too bright, dark, or dull to be used together in a report. Sometimes the colors aren’t easily distinguishable from each other. Other times, the colors needed for various situations (main color, ancillary colors, highlight color, error color, KPIs, text, borders) aren’t available in the corporate palette.

You can still stay on brand and create a consistent user experience with a color palette optimized for data visualization. But you may not be using the exact hex values as defined in the corporate palette. I like to say the data viz color palette is “inspired by” the marketing color palette.

I asked on twitter if anyone had a corporate color palette they needed to convert into a data visualization palette, and someone volunteered theirs. So this post is my walk-through of how I went about creating the palette.

Step 1: Identify a Main Color

There is often a main color in the corporate color palette. If that color is a medium intensity color, I usually include that color in my color palette as is. If it is excessively dark, light, or gray, I’ll either tweak the color a bit or use the second color in the color palette.

Step 2: Choose a Color Scheme

Next, I need to decide what kind of relationship the other colors will have with the main color. In other words, I have to decide what type of color scheme I want to use. I tend to go for monochromatic or analogous color schemes. Complimentary color schemes can be difficult, depending on your main color. I generally try to stay away from using reds and greens together in the same palette because it’s hard to stay colorblind-friendly and because the primary colors together can make it feel like a Christmas or kindergarten theme. I often try to reserve reds and oranges to draw attention to specific data points, but that isn’t a hard and fast rule.

I need 2 – 4 ancillary colors to go with my main color. I rarely need to use all 4 colors together in one chart, but there are some cases such as line charts with 4 series where that will be necessary. People can preattentively distinguish up to about 7 colors at once, so I need to use fewer than 7 colors in a single chart. If I encounter a situation where I feel like I need more than 4 colors together, I re-evaluate my choice of chart type and my use of color within the chart.

Also, I want the colors to be roughly the same level of brightness and intensity. Most importantly, the colors need to be easily distinguishable from each other.

RGB Color Wheel

Step 3: Choose Highlight and Error Colors

We often need to draw attention to specific data points to indicate that they require attention. This is usually because a value is outside of the expected range. KPIs are common in Power BI reports, I need to make sure I have a color to indicate “bad” statuses. I also like to have a highlight color that doesn’t necessarily mean “bad”, just “look here”. These highlight and error colors need to be noticeably different from my other colors so that they draw attention to the data points where they are used.

Step 4: Add Border and Background Colors

I like to add grays and browns to go with my color scheme. I’ll use them mostly for borders, grid lines, text, and light background shades. But also, I want to make sure I have 8 colors in my palette. If I have fewer than 8 colors, Power BI will add colors from the default palette at the end of my colors to fill out the full 8 columns.

Color Palette Creation Example

The original corporate color palette that I was given had a lot of colors.

Primary Corporate Colors
Secondary Corporate Colors

The primary colors go all the way around the color wheel. I definitely don’t want to use them all together. The secondary colors have the beginnings of a monochromatic blue palette, an analogous blue/green palette, or an analogous orange/red/purple palette.

I don’t need all of these different hues. I need 8 medium-intensity colors. Power BI will add black and white and provide the shades and tints for me.

Main Color

I’m keeping the main color as it is. It is bright and saturated enough to not be dull/boring and also not so bold as to leave no room for bolder colors to be used to highlight specific data points.

Chart using the main color of my palette

Color Scheme

I choose an analogous color scheme, which means I pick colors that are next to my main color on the color wheel. Since blue is my main color, I stick with cool colors for the ancillary colors.

Main color plus 3 ancillary colors

I want my 4 colors to be easily distinguishable from each other, and I want them to be roughly the same intensity and brightness.

Highlight and Error Colors

I’m adding yellow and red to my palette. The yellow can be a generic highlight color as well as a “caution” color. The red can be my “bad” color. I’m checking that my colors are easily distinguishable for various types of color vision deficiency.

I confirm that my highlight and error colors are easily distinguishable from the other colors for the most common types of color vision deficiency. I can also see here that my second and fourth colors look a bit similar on the deuteranopia line, so I’ll have to be careful how I use them together, perhaps switching to a shade or tint of the fourth color if needed.

Border and Background Colors

Now I add my grays and browns to use for formatting. This completes my color palette.

Full color palette for Power BI Theme

Power BI Theme

I can take the hex values for my colors and drop them in the color theme generator on PowerBI.Tips to get my JSON theme file.

{
"name":"Analogous",
 "dataColors":["#00aaff", "#00c2b2", "#213dca", "#7514ff", "#ffd500", "#ff002b","#768389","#987665"]
 }

When I import my theme file into my Power BI report, I get the additional tints and shades from the colors I provided.

Next I try out my new color theme in a report to see if I need to tweak any colors. This is the true test. The colors may look great in little boxes, but they might need to be altered to work on a full report page. The shade of purple that I used originally (not shown in this blog post) was a bit too intense compared to the other colors, so I replaced it with a slightly muted tint that better matched the other colors. That is the type of thing you will notice when applying your theme to a report. Don’t get too stuck on finding the exact perfect colors. Colors look slightly different on different screens. Just make sure nothing is inadvertently distracting.

Helpful Color Tools

I’m currently using https://color.mediaandme.be to create my color palettes. It’s free, and it allows me to add many (> 6) colors to my palette. Other benefits:

  • It shows me what all the colors look like together
  • It provides a colorblindness simulator
  • It lets me easily tweak hue, saturation, and brightness
  • It generates a link for the color palette I create so I can easily share it with others for feedback

When I need ideas for how to tweak a color, I use https://www.colorhexa.com. I picked the gray color in my palette by getting the grayest tone of my main color from ColorHexa.

Further Reading on Color in Data Visualization

Tiger Color: Basic Color Schemes – Introduction to Color Theory
Maureen Stone: Expert Color Choices
Billie Gray: Another post about colours for data visualisation. Part 3 — DIY Palettes
Christopher Healy: Perception in Visualization

Data Visualization, Microsoft Technologies, Power BI

Storytelling Without Data?

There are many great resources out there for data visualization. Some of my favorite data viz people are Storytelling With Data (b|t), Alberto Cairo (b|t), and Andy Kirk (b|t). I often reference their work when I present on data visualization in the context of the Microsoft Data Platform. Their work has helped me choose the right chart types for my data and format it so it communicates the right message and looks good. But one topic I have noticed that most data visualization experts rarely address is the question I get in almost every presentation I give:

How do I tell a story with data when my data is always changing?

If you are a BI/report developer, you know this challenge well. You may follow all the guidelines: choose a good color palette, make visuals that highlight the important data points, get rid of clutter. But what happens when your data refreshes tomorrow or next month or next year? It’s much easier to make a chart with static data. You can format it so it communicates exactly the right message. But out here in Automated Reporting Land, that is not the end of our duties. We have to make some effort to accommodate future data values. Refreshing data creates issues such as:

  • We can’t put a lot of static explanatory text on the page to help our audience understand the trends because the trends will change as the data is refreshed. Example: “Sales are up year over year, and the East region is the top contributor to current quarter profit” is true today, but may not be true next month.
  • My chart may look good today, but new values may come in that change the scale and make it difficult to see small numbers compared to a very large number. Example: A bar chart showing inventory levels by product looks reasonable today because all products have a stock level between 1 and 50. But next month, a popular new product comes in, and you have 500 of them, which changes the scale of your bar chart and makes every other product’s inventory super small and not easy to compare.
  • I can’t statically highlight outliers because my outliers will change over time. Example: I have a chart that shows manufacturing defects by line, and I want to highlight that the dog treats line has too many defects. I can’t just select that data point on the chart and change the color because next month the dog treats line might be doing fine.

How do we form a message with known metrics and data structure but without specific data values?

When people have asked me about this in the past, I gave an answer similar to the following:

Instead of a message about specific data values, I consider my audience and the metrics they care about and come up with the top 2 or 3 questions they would want to answer from my report. Then I build charts that address those questions and put them in an order that matches the way my intended audience would analyze that information. This might include ordering the visuals appropriately on a summary page as well as creating drillthrough paths to more information based upon items and filters selected to help my user understand the reasons for their current values.

While this isn’t horrible advice, I felt I needed a better answer on this issue. So I sought advice from Andy Kirk, and he responded brilliantly!


To the issue of situations where data is periodically refreshed, I see most encounters (ie. the relationship between reader and content) characterised less by storytelling (the act of the creator) and more by storyforming (the act of the reader). 

Andy Kirk

Storyforming

Andy went on to explain, “What I mean by this is that usually the meaning of the data is unique to each reader and their own knowledge, their own needs, their own decision-contexts. So rather than the creator ‘saying something’ about such frequently changing data in the form of messages or headlines, often it might be more critical to provide visual context in the form of signals (like colours or markers/bandings) that indicate to the reader that what they are looking at is significantly large/small/above average/below average/off-target/on-target/etc. but leave it up to THEM to arrive at their own story.”

Then he gave an example: “I find this context a lot working with a football club here in the UK. Their data is changing every 3-7 days as new matches are played. So the notion of a story is absent from the visuals that I’m creating for their players/management/coaches. They know the subject (indeed better than me, it’s their job!), they don’t need me to create any display that ’spells out’ for them the story/meaning, rather they need – like the classic notion of a dashboard – clear signals about what the data shows in the sense of normal/exceptional/improving/worsening.”

Andy also agreed that a key aspect of storyforming is that interactive controls (slicers, filters, cross-filtering capabilities) in your report consumption tools give the reader the means to overcome the visual chaos that different data shapes may cause through natural variation over time.

Less Eye Rolling, More Storyforming

If you are a BI developer and have rolled your eyes or sighed in frustration when someone mentioned storytelling in data visualization, try thinking about it as storyforming. Make sure you have the right characters (categories and metrics) and major plot points (indicators of size, trend, or variance from target). You are still responsible for choosing appropriate chart types and colors to show the trends and comparisons, but don’t be so focused on the exact data points.

Many reporting tools (including Power BI) allow you to perform relative calculations (comparison to a previous period, variance from goal, variance from average) to dynamically create helpful context and identify trends and outliers. In Power BI, there are custom visuals that allow you to add dynamically generated natural language explanations if you feel you need more explanatory text (ex 1, ex 2, ex 3). And Power BI will soon be getting expression-based formatting for title text in visuals, which can also help with providing insights in the midst of changing values.

But mostly, try to design your report so that users can slice and filter to get to what matters to them. Then let your users fill in the details and meaning for themselves.

Microsoft Technologies, Power BI

Power BI: Where Should My Data Live? Webcast

When you start a Power BI project, you need to decide how and where you should store the data in your dataset. There are three “traditional” options:

  • Imported Model: Data is imported and compressed and stored in the PBIX file, which is then published to the Power BI Service (or Report Server if you are on-prem)
  • Live Connection: Data is stored in Analysis Services and your Power BI dataset is really a reference to the Analysis Services database.
  • DirectQuery: Data remains in the source system and Power BI stores metadata and a reference to the source data, executing live queries when a user interacts with a report

As Power BI has evolved, there are now some variations and additions to those options. Composite models allow you to combine imported data sources and DirectQuery data sources. We also now have dataflows, which allow you to use self-service data prep to define and share reusable data entities.

Each of these options has its advantages and limitations. There is no single right answer of which one you should always pick.

If you have been struggling with this topic, or just want to double-check your thinking, please join me and Kerry Tyler (@AirborneGeek on twitter) for our Denny Cherry & Associates Consulting webcast on April 5th at 12pm Mountain / 2pm Eastern.

The webcast will review your options for where to store data and explain the factors that should be used in determining what option is right for you. Obvious requirements such as data size, license costs and management, and desired data latency will be discussed. We’ll also talk about other factors such as the desire for self-service BI and avoiding data model sprawl. We’ll have content to present, but we are also happy to take questions during the webcast.

Register for the webcast today and join us next Friday, April 5th.

Accessibility, Data Visualization, Microsoft Technologies, Power BI

Power BI Now Has Keyboard Accessible Visual Interactions

The March 2019 release of Power BI Desktop has brought us keyboard accessible visual interactions. One of Power BI’s natural strengths is that you can click on a data point within a visual and have it cross-highlight or cross-filter the other visuals on a page. But keyboard-only users weren’t able to use this feature until now. This greatly raises the accessibility of the Power BI report consumption experience.

Below is a demonstration of interacting with a visual using keyboard commands. Notice how I can select specific data points within the line chart, and the other charts on the page filter based upon the selection.

Keyboard commands can now access visual interactions

If you are a person that prefers to use a keyboard over a mouse, this might also be something you want to try. Relevant keyboard commands include:

  • Ctrl + right arrow: Move focus into the chart area of the visual
  • Tab or Arrow keys: navigate between data points (or legend items in a chart that contains a legend)
  • Enter or Space: select a data point within a visual
  • Ctrl + Enter or Ctrl + Space: select multiple data points within a visual
  • Ctrl + shift + c: clear all selections

I think this was the last big missing piece of keyboard accessibility. I’m excited to see its impact on users. If you try keyboard accessible visual interactions, or are taking advantage of keyboard accessibility in Power BI in general, please let me know how you are liking it. Tweet me at @mmarie or send me a note via my blog contact form.

Microsoft Technologies, Power BI

What Data Is Being Sent Externally By Power BI Visuals?

As you build your Power BI reports, you may want to use maps and custom visuals. Have you thought about data privacy and what data is getting shared by those visuals? If you have sensitive data in your reports, you will probably want to look into this.

Maps

Most built-in visuals do not share data externally. But the default map visuals in Power BI need to share data with Bing Maps in order to geocode data points. Microsoft has documented that what is shared depends on the map type and the type of location data used.

For bubble maps, no data is sent to Bing if you are using only longitude and latitude. Otherwise, any data in the Location and filter buckets is sent to Bing.

For filled maps, data in the Location, Longitude, and Latitude buckets is shared with Bing.

For ArcGIS maps, Esri staff have said “Only the data needed to geocode the address (i.e., fields placed in the Location field well) are passed to Esri servers. These data are only used to generate the information used to place the locations on the map and they are not stored by Esri servers.”

Custom Visuals

Custom visuals are created by developers using the custom visuals SDK. There are 3 ways to deploy custom visuals for use by report builders:

  • Sharing a .pbiviz file
  • Adding to the organizational visuals tenant repository
  • Having users download visuals from the marketplace (AppSource)

When you receive and use a .pbiviz file, you are taking responsibility for assessing data security. When your Power BI admin deploys a custom visual to the organizational visuals repository, they are approving the visual for use inside your organization.

If you are using visuals from the marketplace, you will need to check the information provided about data privacy, and it’s not all that straightforward at the moment.

Certified Visuals

One thing that makes understanding data privacy in custom visuals easier is the designation of a certified custom visual. One of the requirements for certification is ” Does not access external services or resources, including but not limited to, no HTTP/S or WebSocket requests go out of Power BI to any services.”

You can find the list of currently certified custom visuals on this page. Custom visuals are also identified in the marketplace by a blue star with a check mark.

Power BI Custom Visual Marketplace with Certified Visuals
Power BI Custom Visual Marketplace

Uncertified Visuals

Uncertified visuals are not necessarily less secure than custom visuals, but they have not been tested by Microsoft to confirm security. Any random person can create a custom visual, which is pretty cool and also potentially dangerous for data security.

Microsoft has tried to remind you of this in App Source. On each visual that is not certified, you will see a notice, such as the one below.

Disclaimer Placed on Uncertified Custom Visuals
Disclaimer Placed on Uncertified Custom Visuals

This is helpful, but there are a couple of problems.

  1. This information is at the bottom of the the visual description. Once you select a visual from the list, you most likely need to scroll down to see this note.
  2. This is generic, boilerplate language added by AppSource. They are basically saying that it is possible that the visual might send data over the internet. They are not telling you that it definitely does.

As far as I can tell, that notice is put on any custom visual that isn’t certified. That leaves things a little murky. If you want to know the data privacy policy of a particular custom visual, you have to find the link in the description in AppSource and go read it.

Apparently, every custom visual in the marketplace must have an accompanying privacy policy. You can find the link to the privacy policy by looking at AppSource in a browser (rather than within the window in Power BI desktop). The privacy policy is in the left column near the bottom.

Example Custom Visual With Link to Privacy Policy
Example Custom Visual With Link to Privacy Policy

But there doesn’t seem to be a standard template for the privacy policy, so you may not find what you are looking for there. For example, the Violin Plot has a very simple and helpful privacy policy.

Violin Plot Privacy Policy
Violin Plot Privacy Policy

The privacy policy on the custom visuals by Enlighten Designs (maker of Enlighten Aquarium, Enlighten Data Story, Enlighten Waffle Chart, and several other visuals) is a more generic document that does not speak specifically to the data sent externally by each visual. Sometimes custom visuals have other notes posted in their publicly available repo that might be helpful, but that is not a guarantee.

So if you really need to know and the privacy policy doesn’t state it, you might have to contact the creator of the custom visual and ask them. But even then, you are just trusting their answer. If a third party were going to make a malicious visual that steals your data, they probably wouldn’t tell you they were doing that.

What Have We Learned?

Determining what data is sent externally by a custom visual is not simple. While many visuals are sandboxed and do not communicate externally, some of them do, and any uncertified custom visual might.

Report Creators

If you to need visualize sensitive data, try to stick with the built-in visuals and certified custom visuals as much as possible to keep your data secure. If you have someone in your organization who can take the time to review the code of uncertified visuals (assuming it is available) to ensure your data privacy, that’s great. But most people don’t have that resource available. If you want to use an uncertified visual, check the privacy policy or other notes found in the links posted on AppSource and understand that you are trusting that is accurate.

Custom Visual Creators

It would be great if you could explicitly state what (if any) data is sent to external services or resources so users can feel more comfortable and be able to use your custom visual more often, such as in scenarios where they have sensitive data. If you could add this to your description where everyone can see it in AppSource without having to click through a bunch of links, that would be awesome. It would still be great if you could note it in your privacy policy or somewhere else that is directly linked from AppSource.

Data Visualization, Microsoft Technologies, Power BI

Violin Plots in Power BI

In case you aren’t familiar, I would like to introduce you to the violin plot.

A violin plot is a nifty chart that shows both distribution and density of data. It’s essentially a box plot with a density plot on each side. Box plots are a common way to show variation in data, but their limitation is that you can’t see frequency of values. In other words, you can see statistics such as min, max, median, mean, or quartiles, but you can’t see the individual values nor how often they occurred.

example box plot
Example box plot showing min, max, median, and quartiles

The violin plot overcomes this limitation (by adding the density plot) without taking up much more room on the canvas.

In Power BI, you can quickly make a violin plot using a custom visual. The Violin Plot custom visual (created by Daniel Marsh-Patrick) has many useful formatting options. First, you can choose to turn off the box plot and just show the density plot. Or you can choose to trade the box plot for a barcode plot.

box plot with bar code plot
Box plot with barcode plot in Power BI

Formatting the Violin Plot

There are several sections of formatting for this visual. I’ll call out a few important options here. First, the Violin Options allow you to change the following settings related to the density plot portion of the violin plot.

Formatting options for the density plot in the violin plot.

Inner padding controls the space between each violin. Stroke width changes the width of the outline of the density plot. The sampling resolution controls the detail in the outline of the density plot. Check out Wikipedia to learn more about the kernel density estimation options.

The Sorting section allows you to choose the display order of the plots. In the example above, the sort order is set to sort by category. You can then choose whether the items should be sorted ascending or descending.

Sorting options for the violin plot in Power BI

Next you can choose the color and transparency of your density plot. You have the ability to choose a different color for each plot (violin), but please don’t unless you have a good reason to do so.

Data colors controls the color of the density plot

The Combo Plot section controls the look of the bar code plot or box plot. Inner padding determines the width of the plot. Stroke width controls the width of the individual lines in the bar code plot, or the outline and whiskers in the box plot. You can change the box or bar color in this section. For the barcode plot, you can choose whether you would like to show the first and third quartiles and the median the color, line thickness, and line style of their markers.

Also make sure to check out the Tooltip section. It allows you to show various statistics in the tooltip without having to calculate them in DAX or show them elsewhere in the visual.

Violin Plot Custom Visual Issues & Limitations

This is a well designed custom visual, but there are a couple of small things I hope will be enhanced in the future.

  1. The mean and standard deviation in the tooltip are not rounded to a reasonable amount of digits after the decimal.
  2. The visual does not seem to respond to the Show Data keyboard command that places data in a screen reader friendly table.

As always, make sure to read the fine print about what each custom visual is allowed to do. Make sure you understand the permissions you are granting and that you and your organization are ok with them. For example, I used public weather data in my violin plot, so I had no concerns about sending the data over the internet. I would be more cautious if I were dealing with something more sensitive like patient data in a hospital.

Update: The creator of the violin plot left a great comment on this post to let us know that the two capabilities listed above are boilerplate from Microsoft. The violin plot privacy policy can be found here. The violin plot does not specifically send data over the internet.

Introducing the Violin Plot to Your Users

I think violin plots (especially the flavor with the bar code plot) are fairly easy to read once you have seen one, but many people may not be familiar with them. In my weather example above, I made an extra legend to help explain what the various colors of lines mean.

Another thing you might consider is adding an explainer on how to read the chart. I used a violin plot with a coworker who does not nerd out on data viz to show query costs from queries executed in SQL Server, and I added an image that explains how to read the chart.

Example explanation of how to read a violin plot

After all, we use data visualization to analyze and present data effectively. If our users don’t understand it, we aren’t doing our job well.

Have you used the violin plot in Power BI? Leave me a comment about what kind of data you used it with and how you liked the resulting visual.