Data Visualization, Microsoft Technologies, Power BI

Violin Plots in Power BI

In case you aren’t familiar, I would like to introduce you to the violin plot.

A violin plot is a nifty chart that shows both distribution and density of data. It’s essentially a box plot with a density plot on each side. Box plots are a common way to show variation in data, but their limitation is that you can’t see frequency of values. In other words, you can see statistics such as min, max, median, mean, or quartiles, but you can’t see the individual values nor how often they occurred.

example box plot
Example box plot showing min, max, median, and quartiles

The violin plot overcomes this limitation (by adding the density plot) without taking up much more room on the canvas.

In Power BI, you can quickly make a violin plot using a custom visual. The Violin Plot custom visual (created by Daniel Marsh-Patrick) has many useful formatting options. First, you can choose to turn off the box plot and just show the density plot. Or you can choose to trade the box plot for a barcode plot.

box plot with bar code plot
Box plot with barcode plot in Power BI

Formatting the Violin Plot

There are several sections of formatting for this visual. I’ll call out a few important options here. First, the Violin Options allow you to change the following settings related to the density plot portion of the violin plot.

Formatting options for the density plot in the violin plot.

Inner padding controls the space between each violin. Stroke width changes the width of the outline of the density plot. The sampling resolution controls the detail in the outline of the density plot. Check out Wikipedia to learn more about the kernel density estimation options.

The Sorting section allows you to choose the display order of the plots. In the example above, the sort order is set to sort by category. You can then choose whether the items should be sorted ascending or descending.

Sorting options for the violin plot in Power BI

Next you can choose the color and transparency of your density plot. You have the ability to choose a different color for each plot (violin), but please don’t unless you have a good reason to do so.

Data colors controls the color of the density plot

The Combo Plot section controls the look of the bar code plot or box plot. Inner padding determines the width of the plot. Stroke width controls the width of the individual lines in the bar code plot, or the outline and whiskers in the box plot. You can change the box or bar color in this section. For the barcode plot, you can choose whether you would like to show the first and third quartiles and the median the color, line thickness, and line style of their markers.

Also make sure to check out the Tooltip section. It allows you to show various statistics in the tooltip without having to calculate them in DAX or show them elsewhere in the visual.

Violin Plot Custom Visual Issues & Limitations

This is a well designed custom visual, but there are a couple of small things I hope will be enhanced in the future.

  1. The mean and standard deviation in the tooltip are not rounded to a reasonable amount of digits after the decimal.
  2. The visual does not seem to respond to the Show Data keyboard command that places data in a screen reader friendly table.

As always, make sure to read the fine print about what each custom visual is allowed to do. Make sure you understand the permissions you are granting and that you and your organization are ok with them. For example, I used public weather data in my violin plot, so I had no concerns about sending the data over the internet. I would be more cautious if I were dealing with something more sensitive like patient data in a hospital.

Update: The creator of the violin plot left a great comment on this post to let us know that the two capabilities listed above are boilerplate from Microsoft. The violin plot privacy policy can be found here. The violin plot does not specifically send data over the internet.

Introducing the Violin Plot to Your Users

I think violin plots (especially the flavor with the bar code plot) are fairly easy to read once you have seen one, but many people may not be familiar with them. In my weather example above, I made an extra legend to help explain what the various colors of lines mean.

Another thing you might consider is adding an explainer on how to read the chart. I used a violin plot with a coworker who does not nerd out on data viz to show query costs from queries executed in SQL Server, and I added an image that explains how to read the chart.

Example explanation of how to read a violin plot

After all, we use data visualization to analyze and present data effectively. If our users don’t understand it, we aren’t doing our job well.

Have you used the violin plot in Power BI? Leave me a comment about what kind of data you used it with and how you liked the resulting visual.

Azure, Azure Data Factory, Microsoft Technologies, Power BI

How Many Data Gateways Does My Azure BI Architecture Need?

Computer with lock protecting data

It’s not always obvious when you need a data gateway in Azure, and not all gateways are labeled as such. So I thought I would walk through various applications that act as a data gateway and discuss when, where, and how many are needed.

Note: I’m ignoring VPN gateways and application gateways for the rest of this post. You could set up a VPN gateway to get rid of some of the data gateways, but I’m assuming your networking/VPN situation is fixed at this point and working from there. This is a common case in my experience as many organizations are not ready to jump into Express Route when first planning their BI architecture.

Let’s start with what services may require you to use a data gateway.

You will need a data gateway when you are using Power BI, Azure Analysis Services, PowerApps, Microsoft Flow, Azure Logic Apps, Azure Data Factory, or Azure ML with a data source/destination that is in a private network that isn’t connected to your Azure subscription with a VPN gateway. Note that a private network includes on-premises data sources and Azure Virtual Machines as well as Azure SQL Databases and Azure SQL Data Warehouses that require use of VNet service endpoints rather than public endpoints.  

Luckily, many of these services can use the same data gateway. Power BI, Azure Analysis Services, PowerApps, Microsoft Flow, and Logic Apps all use the On Premises Data Gateway. Azure Data Factory (V1 and V2) and Azure Machine Learning Studio use the Data Factory Self-Hosted Integration Runtime.

On Premises Data Gateway (Power BI et al.)

If you are using one or more of the following:

  • Power BI
  • Azure Analysis Services
  • PowerApps
  • Microsoft Flow
  • Logic Apps

and you have a data source in a private network, you need at least one gateway. But there are a few considerations that might cause you to set up more gateways.

  1. Your services must be in the same region to use the same gateway. This means that your Power BI/Office 365 region and Azure region for your Azure Analysis Services resource must match for them to all use one gateway.  If you have resources in different regions, you will need one gateway per region.
  2. You may want high availability for your gateway. You can create high availability clusters so when one gateway is down, traffic is rerouted to another available gateway in the cluster.
  3. You may want to segment traffic to ensure the necessary resources for certain ad hoc live/direct queries or scheduled refreshes. If your usage and refresh patterns warrant it, you may want to set up one gateway for scheduled refreshes and one gateway for live/direct queries back to any on-premises data sources. Or you might make sure live/direct queries for two different high-traffic models go through different gateways so as not to block each other. This isn’t always warranted, but it can be a good strategy.

Data Factory Self-hosted Integration Runtime

If you are using Azure Data Factory (V1 or V2) or Azure ML with a data source in a private network, you will need at least one gateway. But that gateway is called a Self-hosted Integration Runtime (IR).

Self-hosted IRs can be shared across data factories in the same Azure Active Directory tenant. They can be associated with up to four machines to scale out or provide higher availability. So while you may only need one node, you might want a second so that your IR is not the single point of failure.

Or you may want multiple IRs to boost throughput of copy activities. For instance, copying from an on-premises file server with one IR node is about 195 Megabytes per second (MB/s). But with 4 IR nodes, it can be as fast as 505 MB/s.

Factors that Affect the Number of Data Gateways Needed

The main factors determining the number of gateways you need are:

  1. Number of data sources in private networks (including Azure VNets)
  2. Location of services in Azure and O365 (number of regions and tenants)
  3. Desire for high availability
  4. Desire for increased throughput or segmented traffic

If you are importing your data to Azure and using an Azure SQL DB with no VNet as the source for your Power BI model, you won’t need an On Premises Data Gateway. If you used Data Factory to copy your data from an on-premises SQL Server to Azure Data Lake and then Azure SQL DB, you need a Self-Hosted Integration Runtime.

If all your source data is already in Azure, and your source for Power BI or Azure Analysis Services is Azure SQL DW on a VNet, you will need at least one On-Premises Data Gateway.

If you import a lot of data to Azure every day using Data Factory, and you land that data to Azure SQL DW on a VNet, then use Azure Analysis Services as the data source for Power BI reports, you might want a self-hosted integration runtime with a few nodes and a couple of on-premises gateways clustered for high availability.

Have a Plan For Your Gateways

The gateways/integration runtimes are not hard to install. They are just often not considered, and projects get stalled waiting until a machine is provisioned to install them on. And many people forget to plan for high availability in their gateways. Make sure you have the right number of gateways and IR nodes to get your desired features and connectivity. You can add gateways/nodes later, but you don’t want to get caught with no high availability when it really matters.

Conferences, Data Visualization, Microsoft Technologies, PASS Summit

Join me for the PASS Data Expert Series Feb 7

I’m honored to have one of my PASS Summit sessions chosen to be part of the PASS Data Expert Series on February 7. PASS has curated the top-rated, most impactful sessions from PASS Summit 2018 for a day of solutions and best practices to help keep you at the top of your field. There are three tracks: Analytics, Data Management, and Architecture. My session is in the Analytics track along with some other great sessions from Alberto Ferrari, Jen Underwood, Carlos Bossy, Matt How, and Richard Campbell.

The video for my session, titled “Do Your Data Visualizations Need a Makeover?”, starts at 16:00 UTC (9 AM MT). I’ll be online in the webinar for live Q&A and chat related to the session.

I hope you’ll register and chat with me about data visualizations in need of a makeover on February 7.

Azure, Microsoft Technologies

The Necessary Extras That Aren’t Shown in Your Azure BI Architecture Diagram

When we talk about Azure architectures for data warehousing or analytics, we usually show a diagram that looks like the below.

Modern DW Architecture
Modern DW Architecture https://azure.microsoft.com/en-us/solutions/architecture/modern-data-warehouse/

This diagram is a great start to explain what services will be used in Azure to build out a solution or platform. But many times, we add the specific resource names and stop there. If you have built several projects in Azure, you will know there some other things for which you will need to plan. So what’s missing?

Azure Active Directory

Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based identity and access management service. Members of an organization have a user account that can sign in to various services. AAD is used to access Office 365, Power BI, and Dynamics 365, as well as the Azure portal. It can also be used to grant access and permissions to specific Azure resources.

For instance, users who will participate in Data Factory pipeline development must be assigned to the Data Factory Contributor role in Azure. Users can authenticate via AAD to log in to Azure SQL DW. You can use AD Groups to grant permissions to users interacting with various Azure resources such as Azure SQL DW or SQL DB as well as Azure Analysis Services and Power BI. It’s more efficient to manage permissions for groups than to do it for each user. You’ll need a plan to manage group membership.

You may also need to register applications in Azure Active Directory to allow Azure services to authenticate and interact with each other. While the guidance for many services is now to use managed identities, this may not be available for every situation just yet.  

If your organization has some infrastructure on premises, it is likely that they have Active Directory on premises as well. So you will want to make sure you have a solution in place to sync your on-premises and Azure Active Directory.

Networking

Virtual Networks (or VNets) allow many types of Azure resources to securely communicate with each other, the internet, and on-premises networks. You can have multiple virtual networks in an Azure subscription. Each virtual network is isolated from other VNets, unless you set up VNet peering.

Some PaaS services such as Azure Storage Accounts, Azure SQL DW, and Azure Analysis Services support Virtual Network Service Endpoints. A common usage scenario is to set up VNets and VNet Service Endpoints to connect resources to on-premises networks. Some organizations prefer to use VNet Service Endpoints instead of public service endpoints, making it so that traffic can only access the resource from within the organization’s local network.

In order to connect a VNet to an on-premises network or another VNet (outside of peering), you’ll need a VPN Gateway. You’ll need to identify the most appropriate type of VPN Gateway: Point-to-Site, Site-to-Site, or Express Route. These offerings differ based on bandwidth, protocols supported, routing, connection resiliency, and SLAs. Pricing can vary greatly based upon your gateway type.

While VNets and VPN Gateways are probably the most common networking resources in Azure, there are many other networking services and related design decisions to consider as you plan an Azure deployment.

Data Gateways

Your BI solution may be entirely in Azure, but if you need to retrieve data from data sources in a private network (on premises or on a VNet), you’ll need a gateway. If you are using Azure Data Factory, you’ll need a Self-hosted Integration Runtime (IR). If the source for your Power BI or Azure Analysis Services model is on a private network, you’ll need an On-Premises Data Gateway. You can use the same gateway to connect to Analysis Services, Power BI, Logic Apps, Power Apps, and Flow. If you will have a mix of Analysis Services and Power BI models sharing the same gateway, you’ll need to make sure that your Power BI region and your AAS region match.

These gateways require that you have a machine inside the private network on which to install them. And if you want to scale out or have failover capabilities, you may need multiple gateways and multiple VMs. So while you may be building a solution in Azure, you might end up with a few on-premises VMs to be able to securely move source data.

Dev and Test Environments

Our nice and tidy diagram above is only showing production. We also need at least a development environment and maybe one or more test environments. You’ll need to decide how to design your dev/test environments. You may want duplicate resources in a separate resource group; e.g. a dev resource group that contains a dev Data Factory, a dev Azure SQL DW, a dev Azure Analysis Services, etc. While separating environments by resource group is common, it’s not your only option. You will need to decide if you prefer to separate environments by resource group, subscription, directory, or some combination of the three.

ARM templates and PowerShell are very useful for deploying updates and creating new environments. Also, take a look at Azure Blueprints.

You’ll also want to investigate ways to keep the costs of non-prod environments down via smaller-sized resources or pausing or deleting resources where applicable.

Plan For, Don’t Worry About, the Extras

There are several other ancillary Azure services that could/should be part of your solution.

  • For source control, GitHub and Azure Repos have the easiest integration, especially with Azure Data Factory. You’ll not only want source control for things like database projects and Data Factory pipelines, but also possibly for ARM templates and PowerShell scripts used to deploy resources (think: infrastructure as code).
  • You’ll want to set up Azure Monitoring, including alerts to let you know when processes and services are not running as intended.
  • If you want more cost management support than just setting a spending limit on a subscription (if it is available for your subscription type), it may be helpful to set budgets in Azure so you can be notified when you hit certain percentages or amounts.

Be sure to think through the entire data/solution management lifecycle. You may want to include extra services for cataloging, governing, and archiving your data.

This may sound like a complex list, but these resources and services are fairly easy to work with. Azure Active Directory has a user-friendly GUI in the portal, and commands can be automated with PowerShell, requiring relatively little effort to manage users and groups to support your BI environment. VNets and VPN Gateways are a little more complex, but there are step-by-step instructions available for many tasks in the Microsoft Docs. The Power BI Gateway and ADF IR have quick and easy GUI installers that just take a few minutes. You can automate Azure deployments with Azure Pipelines or PowerShell scripts.

None of these things are really that awful to implement or manage in Azure, unless you weren’t aware of them and your project is paused until you can get them set up.

Is there anything else you find is commonly left out when planning for BI solutions in Azure? Leave me a comment if you would like to add to the list.

Update (1/26/19): Helpful readers have commented on other aspects of your Azure BI architecture they felt were often overlooked:

  • Make sure you have a plan for how to process your Azure Analysis Services model. Will you use Azure Automation? Call an Azure Function from Data Factory?
  • Be sure to organize and tag your resources appropriately to help you understand and control costs.
  • Don’t forget Azure Key Vault. This will help you keep keys and passwords secure.

(Thanks to Chad Toney, Shannon Holck, and Santiago Cepas for these suggestions.)

Accessibility, Data Visualization, Microsoft Technologies, Power BI

Tab Order Enhances Power BI Report Accessibility

Confused starLike a diamond in the sky.
How I wonder what you are!
Twinkle, twinkle, little star,
Twinkle, twinkle, little star,
Up above the world so high,
How I wonder what you are!

Confused?

You were probably expecting a different order. Order is an important element when singing a song or telling a story or explaining information.

In western cultures we tend to read left to right, top to bottom, making a Z-pattern. This applies to books and blogs as well as reports and data visualizations. But if you are using the keyboard to navigate in a Power BI report, the order in which you arrive at visuals will not follow your vision unless you set the new tab order property. If you have low or no vision, this becomes an even bigger issue because you may not be able to see that you are navigating visuals out of visual order because the screen reader just reads whatever comes next.

It takes effort to consume each visual, and many visuals need the context of the other visuals around them to be most useful. When we present information out of order, we are putting more cognitive load on our users, forcing them to hold information in their limited working memory until they arrive at another visual that helps put the pieces together to make sense.

What is Tab Order?

Tab order is the order in which users interact with the items on a page using the keyboard. Generally, we want tab order to be predictable and to closely match the visual order on the page (unless there is a good reason to deviate). If you press the tab key in a Power BI report, you might be surprised at the seemingly random order in which you move from visual to visual on the page. The default order is the order in which the visuals were placed on the page in Power BI Desktop, or the last modified order in PowerBI.com if you have edited your report there.

I wrote about the issues with tab order in Power BI back in February and posted an idea for it. So I’m quite happy to see it come to fruition. Not only does it increase usability and accessibility, it also helps meet WCAG Success Criterion 2.4.3: Focus Order (accessibility compliance guidelines, for those who do not geek out on this stuff).

How To Set Tab Order In Power BI Desktop

To set the tab order of visuals on a report page in Power BI Desktop, go to the View tab, open the Selection Pane and select Tab Order at the top of the Selection Pane.

TabOrder1TabOrder2

From there, you can move visuals up and down in order, or hide them from tab order completely. This is helpful if you have decorative items on the page whose selection has no value to the user.

To change the tab order, you can either drag an item to a new position in the list, or you can select the item and click the up or down arrows above the list.

TabOrderCHange.gif

In case you missed it, slicers are now keyboard accessible. If you would like users to select values in slicers before using the other visuals on the page, make sure to put the slicers early in the tab order.

It only takes a minute to set the tab order, but it greatly increases usability for keyboard users.

Data Visualization, Microsoft Technologies, PASS Summit, Power BI

Power BI Visual Usability Checklist

At PASS Summit, I presented a session called “Do Your Data Visualizations Need a Makeover?”. In my session I explained how we often set ourselves up for failure when conducting explanatory data visualization before we ever place a visual on the page by not preparing appropriately, and I provided tips to improve. I also gave examples of visual design mistakes I see often. I polled the audience, and they shared some mistakes that they had seen often or that really bothered them. If you missed my presentation, you can watch it on PASS TV or Youtube.

As a companion to my presentation, I created the Power BI Visualization Usability Checklist. For those who are new to data visualization in Power BI, or those that want to employ some type of quality check, I think this is a good place to start. I occasionally do data viz makeover engagements to help people create a report that is more engaging and more widely adopted. This list draws from that experience as well as the tweaks I find myself making to my own Power BI reports. And now I have added a few things that my PASS Summit audience mentioned – thanks to those who shared their suggestions and experiences!

I’m not here to tell you to always use a certain color theme or font, or that everything should be a bar chart. Data visualization is situational and dependent upon your intended audience. I hope I can encourage you to consider your audience, how they take in information, and what information they are looking for.

This checklist provides guidelines to help make sure your report communicates your intended message in a way that works for your intended audience. It has two pages. The Data Viz Usability Checklist page contains the main checklist for you to use while building or reviewing a Power BI report.

Screenshot of Power BI Visualization Usability Checklist

The Data Viz Usability Concepts page gives you quick definitions and links for further reading about the underlying design concepts that inform my list.

Screenshot of Power BI Visualization Usability Concepts

Download the checklist here. I also have a checklist for accessibility in Power BI reports which you can find here.

If you have a suggestion to add to either list, please leave me a comment!

 

Microsoft Technologies, PASS Summit, Personal

Learning Better Presentation Skills (T-SQL Tuesday #108)

TSQLTuesdayThis month’s T-SQL Tuesday is hosted by Malathi Mahadevan (@SqlMal). The topic is to pick one thing I would like to learn that is not SQL Server.

I’m going to go a different direction than I think most people will. I spend a lot of time learning new technologies in Azure, but I am also focusing on learning better presentation skills and improving my use of related technologies. Yes, that often means PowerPoint. But sometimes I do presentations directly in Power BI when I am presenting data or mostly doing demos. Building presentations requires tech, design, and speaking skills. And I enjoy that mix.

I enjoy presenting at user groups and conferences, and lately I’ve been branching out in the types of presentations I give. At PASS Summit this year, I delivered a pre-con and a general session, and I participated in a panel and the BI Power Hour.  Each one required a slightly different presentation style.

Just like we may cringe when we go back and look at old code and wonder what we were thinking, I have the same reaction when I go back and look at old presentations.

As a data viz person, you would think I would be better at building engaging presentations since a lot of the data viz concepts apply to visual presentation, but it’s still a struggle and a constant endeavor to improve.  I have made some progress to date. Below is a sample from a presentation on good report design practices in SSRS that I gave in 2015.

Old Presentation

While it’s not the worst slide I’ve ever seen, it’s definitely not the best. Here are a few slides from my PASS Summit presentation called “Do Your Data Visualizations Need A Makeover?”, which cover the same topic as the above slide.

NewPresentation1NewPresentation2NewPresentation3NewPresentation4

The new slides are much more pleasant and engaging. I have changed to this style based upon what I have learned so far from Echo Rivera’s blog and her Countdown to Stellar Slides program. I use more and larger images and less text on each slide. This naturally leads to having more slides, but I spend less time on each one. And there is less of a chance that I will revert to just reading from a slide since there is just less text. I get to tell you about what that one sentence means to me.

My goals is to learn how to deliver presentations that are more accessible and more engaging.

I blogged about accessible slide templates earlier this year. I got interested in accessibility when I was learning how to make more accessible Power BI reports.  I want people to feel welcome and get something out of my talks, even if they have visual, auditory, or information processing challenges. So far, what I have learned is that I can be more inclusive with a handful of small changes.

To continue my learning about presentation delivery, I plan to:

And of course, I plan to give presentations so I can try out what I learn and improve from there. I have already submitted to SQLSaturday Colorado Springs, and I’m sure I will add more presentations next year.

If you have resources that have been particularly helpful in improving your presentation delivery, please leave them in the comments.

Happy T-SQL Tuesday!