Microsoft Technologies, Power BI

What Data Is Being Sent Externally By Power BI Visuals?

As you build your Power BI reports, you may want to use maps and custom visuals. Have you thought about data privacy and what data is getting shared by those visuals? If you have sensitive data in your reports, you will probably want to look into this.

Maps

Most built-in visuals do not share data externally. But the default map visuals in Power BI need to share data with Bing Maps in order to geocode data points. Microsoft has documented that what is shared depends on the map type and the type of location data used.

For bubble maps, no data is sent to Bing if you are using only longitude and latitude. Otherwise, any data in the Location and filter buckets is sent to Bing.

For filled maps, data in the Location, Longitude, and Latitude buckets is shared with Bing.

For ArcGIS maps, Esri staff have said “Only the data needed to geocode the address (i.e., fields placed in the Location field well) are passed to Esri servers. These data are only used to generate the information used to place the locations on the map and they are not stored by Esri servers.”

Custom Visuals

Custom visuals are created by developers using the custom visuals SDK. There are 3 ways to deploy custom visuals for use by report builders:

  • Sharing a .pbiviz file
  • Adding to the organizational visuals tenant repository
  • Having users download visuals from the marketplace (AppSource)

When you receive and use a .pbiviz file, you are taking responsibility for assessing data security. When your Power BI admin deploys a custom visual to the organizational visuals repository, they are approving the visual for use inside your organization.

If you are using visuals from the marketplace, you will need to check the information provided about data privacy, and it’s not all that straightforward at the moment.

Certified Visuals

One thing that makes understanding data privacy in custom visuals easier is the designation of a certified custom visual. One of the requirements for certification is ” Does not access external services or resources, including but not limited to, no HTTP/S or WebSocket requests go out of Power BI to any services.”

You can find the list of currently certified custom visuals on this page. Custom visuals are also identified in the marketplace by a blue star with a check mark.

Power BI Custom Visual Marketplace with Certified Visuals
Power BI Custom Visual Marketplace

Uncertified Visuals

Uncertified visuals are not necessarily less secure than custom visuals, but they have not been tested by Microsoft to confirm security. Any random person can create a custom visual, which is pretty cool and also potentially dangerous for data security.

Microsoft has tried to remind you of this in App Source. On each visual that is not certified, you will see a notice, such as the one below.

Disclaimer Placed on Uncertified Custom Visuals
Disclaimer Placed on Uncertified Custom Visuals

This is helpful, but there are a couple of problems.

  1. This information is at the bottom of the the visual description. Once you select a visual from the list, you most likely need to scroll down to see this note.
  2. This is generic, boilerplate language added by AppSource. They are basically saying that it is possible that the visual might send data over the internet. They are not telling you that it definitely does.

As far as I can tell, that notice is put on any custom visual that isn’t certified. That leaves things a little murky. If you want to know the data privacy policy of a particular custom visual, you have to find the link in the description in AppSource and go read it.

Apparently, every custom visual in the marketplace must have an accompanying privacy policy. You can find the link to the privacy policy by looking at AppSource in a browser (rather than within the window in Power BI desktop). The privacy policy is in the left column near the bottom.

Example Custom Visual With Link to Privacy Policy
Example Custom Visual With Link to Privacy Policy

But there doesn’t seem to be a standard template for the privacy policy, so you may not find what you are looking for there. For example, the Violin Plot has a very simple and helpful privacy policy.

Violin Plot Privacy Policy
Violin Plot Privacy Policy

The privacy policy on the custom visuals by Enlighten Designs (maker of Enlighten Aquarium, Enlighten Data Story, Enlighten Waffle Chart, and several other visuals) is a more generic document that does not speak specifically to the data sent externally by each visual. Sometimes custom visuals have other notes posted in their publicly available repo that might be helpful, but that is not a guarantee.

So if you really need to know and the privacy policy doesn’t state it, you might have to contact the creator of the custom visual and ask them. But even then, you are just trusting their answer. If a third party were going to make a malicious visual that steals your data, they probably wouldn’t tell you they were doing that.

What Have We Learned?

Determining what data is sent externally by a custom visual is not simple. While many visuals are sandboxed and do not communicate externally, some of them do, and any uncertified custom visual might.

Report Creators

If you to need visualize sensitive data, try to stick with the built-in visuals and certified custom visuals as much as possible to keep your data secure. If you have someone in your organization who can take the time to review the code of uncertified visuals (assuming it is available) to ensure your data privacy, that’s great. But most people don’t have that resource available. If you want to use an uncertified visual, check the privacy policy or other notes found in the links posted on AppSource and understand that you are trusting that is accurate.

Custom Visual Creators

It would be great if you could explicitly state what (if any) data is sent to external services or resources so users can feel more comfortable and be able to use your custom visual more often, such as in scenarios where they have sensitive data. If you could add this to your description where everyone can see it in AppSource without having to click through a bunch of links, that would be awesome. It would still be great if you could note it in your privacy policy or somewhere else that is directly linked from AppSource.

Data Visualization, Microsoft Technologies, Power BI

Violin Plots in Power BI

In case you aren’t familiar, I would like to introduce you to the violin plot.

A violin plot is a nifty chart that shows both distribution and density of data. It’s essentially a box plot with a density plot on each side. Box plots are a common way to show variation in data, but their limitation is that you can’t see frequency of values. In other words, you can see statistics such as min, max, median, mean, or quartiles, but you can’t see the individual values nor how often they occurred.

example box plot
Example box plot showing min, max, median, and quartiles

The violin plot overcomes this limitation (by adding the density plot) without taking up much more room on the canvas.

In Power BI, you can quickly make a violin plot using a custom visual. The Violin Plot custom visual (created by Daniel Marsh-Patrick) has many useful formatting options. First, you can choose to turn off the box plot and just show the density plot. Or you can choose to trade the box plot for a barcode plot.

box plot with bar code plot
Box plot with barcode plot in Power BI

Formatting the Violin Plot

There are several sections of formatting for this visual. I’ll call out a few important options here. First, the Violin Options allow you to change the following settings related to the density plot portion of the violin plot.

Formatting options for the density plot in the violin plot.

Inner padding controls the space between each violin. Stroke width changes the width of the outline of the density plot. The sampling resolution controls the detail in the outline of the density plot. Check out Wikipedia to learn more about the kernel density estimation options.

The Sorting section allows you to choose the display order of the plots. In the example above, the sort order is set to sort by category. You can then choose whether the items should be sorted ascending or descending.

Sorting options for the violin plot in Power BI

Next you can choose the color and transparency of your density plot. You have the ability to choose a different color for each plot (violin), but please don’t unless you have a good reason to do so.

Data colors controls the color of the density plot

The Combo Plot section controls the look of the bar code plot or box plot. Inner padding determines the width of the plot. Stroke width controls the width of the individual lines in the bar code plot, or the outline and whiskers in the box plot. You can change the box or bar color in this section. For the barcode plot, you can choose whether you would like to show the first and third quartiles and the median the color, line thickness, and line style of their markers.

Also make sure to check out the Tooltip section. It allows you to show various statistics in the tooltip without having to calculate them in DAX or show them elsewhere in the visual.

Violin Plot Custom Visual Issues & Limitations

This is a well designed custom visual, but there are a couple of small things I hope will be enhanced in the future.

  1. The mean and standard deviation in the tooltip are not rounded to a reasonable amount of digits after the decimal.
  2. The visual does not seem to respond to the Show Data keyboard command that places data in a screen reader friendly table.

As always, make sure to read the fine print about what each custom visual is allowed to do. Make sure you understand the permissions you are granting and that you and your organization are ok with them. For example, I used public weather data in my violin plot, so I had no concerns about sending the data over the internet. I would be more cautious if I were dealing with something more sensitive like patient data in a hospital.

Update: The creator of the violin plot left a great comment on this post to let us know that the two capabilities listed above are boilerplate from Microsoft. The violin plot privacy policy can be found here. The violin plot does not specifically send data over the internet.

Introducing the Violin Plot to Your Users

I think violin plots (especially the flavor with the bar code plot) are fairly easy to read once you have seen one, but many people may not be familiar with them. In my weather example above, I made an extra legend to help explain what the various colors of lines mean.

Another thing you might consider is adding an explainer on how to read the chart. I used a violin plot with a coworker who does not nerd out on data viz to show query costs from queries executed in SQL Server, and I added an image that explains how to read the chart.

Example explanation of how to read a violin plot

After all, we use data visualization to analyze and present data effectively. If our users don’t understand it, we aren’t doing our job well.

Have you used the violin plot in Power BI? Leave me a comment about what kind of data you used it with and how you liked the resulting visual.

Azure, Azure Data Factory, Microsoft Technologies, Power BI

How Many Data Gateways Does My Azure BI Architecture Need?

Computer with lock protecting data

It’s not always obvious when you need a data gateway in Azure, and not all gateways are labeled as such. So I thought I would walk through various applications that act as a data gateway and discuss when, where, and how many are needed.

Note: I’m ignoring VPN gateways and application gateways for the rest of this post. I’m assuming your networking/VPN situation is fixed at this point and working from there.

Let’s start with what services may require you to use a data gateway.

You will need a data gateway when you are using Power BI, Azure Analysis Services, PowerApps, Microsoft Flow, Azure Logic Apps, Azure Data Factory, or Azure ML with a data source/destination that is in a private network. Note that a private network includes on-premises data sources and Azure Virtual Machines as well as Azure SQL Databases and Azure SQL Data Warehouses that require use of VNet service endpoints rather than public endpoints.  

Luckily, many of these services can use the same data gateway. Power BI, Azure Analysis Services, PowerApps, Microsoft Flow, and Logic Apps all use the On Premises Data Gateway. Azure Data Factory (V1 and V2) and Azure Machine Learning Studio use the Data Factory Self-Hosted Integration Runtime.

On Premises Data Gateway (Power BI et al.)

If you are using one or more of the following:

  • Power BI
  • Azure Analysis Services
  • PowerApps
  • Microsoft Flow
  • Logic Apps

and you have a data source in a private network, you need at least one gateway. But there are a few considerations that might cause you to set up more gateways.

  1. Your services must be in the same region to use the same gateway. This means that your Power BI/Office 365 region and Azure region for your Azure Analysis Services resource must match for them to all use one gateway.  If you have resources in different regions, you will need one gateway per region.
  2. You may want high availability for your gateway. You can create high availability clusters so when one gateway is down, traffic is rerouted to another available gateway in the cluster.
  3. You may want to segment traffic to ensure the necessary resources for certain ad hoc live/direct queries or scheduled refreshes. If your usage and refresh patterns warrant it, you may want to set up one gateway for scheduled refreshes and one gateway for live/direct queries back to any on-premises data sources. Or you might make sure live/direct queries for two different high-traffic models go through different gateways so as not to block each other. This isn’t always warranted, but it can be a good strategy.

Data Factory Self-hosted Integration Runtime

If you are using Azure Data Factory (V1 or V2) or Azure ML with a data source in a private network, you will need at least one gateway. But that gateway is called a Self-hosted Integration Runtime (IR).

Self-hosted IRs can be shared across data factories in the same Azure Active Directory tenant. They can be associated with up to four machines to scale out or provide higher availability. So while you may only need one node, you might want a second so that your IR is not the single point of failure.

Or you may want multiple IRs to boost throughput of copy activities. For instance, copying from an on-premises file server with one IR node is about 195 Megabytes per second (MB/s). But with 4 IR nodes, it can be as fast as 505 MB/s.

Factors that Affect the Number of Data Gateways Needed

The main factors determining the number of gateways you need are:

  1. Number of data sources in private networks (including Azure VNets)
  2. Location of services in Azure and O365 (number of regions and tenants)
  3. Desire for high availability
  4. Desire for increased throughput or segmented traffic

If you are importing your data to Azure and using an Azure SQL DB with no VNet as the source for your Power BI model, you won’t need an On Premises Data Gateway. If you used Data Factory to copy your data from an on-premises SQL Server to Azure Data Lake and then Azure SQL DB, you need a Self-Hosted Integration Runtime.

If all your source data is already in Azure, and your source for Power BI or Azure Analysis Services is Azure SQL DW on a VNet, you will need at least one On-Premises Data Gateway.

If you import a lot of data to Azure every day using Data Factory, and you land that data to Azure SQL DW on a VNet, then use Azure Analysis Services as the data source for Power BI reports, you might want a self-hosted integration runtime with a few nodes and a couple of on-premises gateways clustered for high availability.

Have a Plan For Your Gateways

The gateways/integration runtimes are not hard to install. They are just often not considered, and projects get stalled waiting until a machine is provisioned to install them on. And many people forget to plan for high availability in their gateways. Make sure you have the right number of gateways and IR nodes to get your desired features and connectivity. You can add gateways/nodes later, but you don’t want to get caught with no high availability when it really matters.

Conferences, Data Visualization, Microsoft Technologies, PASS Summit

Join me for the PASS Data Expert Series Feb 7

I’m honored to have one of my PASS Summit sessions chosen to be part of the PASS Data Expert Series on February 7. PASS has curated the top-rated, most impactful sessions from PASS Summit 2018 for a day of solutions and best practices to help keep you at the top of your field. There are three tracks: Analytics, Data Management, and Architecture. My session is in the Analytics track along with some other great sessions from Alberto Ferrari, Jen Underwood, Carlos Bossy, Matt How, and Richard Campbell.

The video for my session, titled “Do Your Data Visualizations Need a Makeover?”, starts at 16:00 UTC (9 AM MT). I’ll be online in the webinar for live Q&A and chat related to the session.

I hope you’ll register and chat with me about data visualizations in need of a makeover on February 7.

Azure, Microsoft Technologies

The Necessary Extras That Aren’t Shown in Your Azure BI Architecture Diagram

When we talk about Azure architectures for data warehousing or analytics, we usually show a diagram that looks like the below.

Modern DW Architecture
Modern DW Architecture https://azure.microsoft.com/en-us/solutions/architecture/modern-data-warehouse/

This diagram is a great start to explain what services will be used in Azure to build out a solution or platform. But many times, we add the specific resource names and stop there. If you have built several projects in Azure, you will know there some other things for which you will need to plan. So what’s missing?

Azure Active Directory

Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based identity and access management service. Members of an organization have a user account that can sign in to various services. AAD is used to access Office 365, Power BI, and Dynamics 365, as well as the Azure portal. It can also be used to grant access and permissions to specific Azure resources.

For instance, users who will participate in Data Factory pipeline development must be assigned to the Data Factory Contributor role in Azure. Users can authenticate via AAD to log in to Azure SQL DW. You can use AD Groups to grant permissions to users interacting with various Azure resources such as Azure SQL DW or SQL DB as well as Azure Analysis Services and Power BI. It’s more efficient to manage permissions for groups than to do it for each user. You’ll need a plan to manage group membership.

You may also need to register applications in Azure Active Directory to allow Azure services to authenticate and interact with each other. While the guidance for many services is now to use managed identities, this may not be available for every situation just yet.  

If your organization has some infrastructure on premises, it is likely that they have Active Directory on premises as well. So you will want to make sure you have a solution in place to sync your on-premises and Azure Active Directory.

Networking

Virtual Networks (or VNets) allow many types of Azure resources to securely communicate with each other, the internet, and on-premises networks. You can have multiple virtual networks in an Azure subscription. Each virtual network is isolated from other VNets, unless you set up VNet peering.

Some PaaS services such as Azure Storage Accounts, Azure SQL DW, and Azure Analysis Services support Virtual Network Service Endpoints. A common usage scenario is to set up VNets and VNet Service Endpoints to connect resources to on-premises networks. Some organizations prefer to use VNet Service Endpoints instead of public service endpoints, making it so that traffic can only access the resource from within the organization’s local network.

In order to connect a VNet to an on-premises network or another VNet (outside of peering), you’ll need a VPN Gateway. You’ll need to identify the most appropriate type of VPN Gateway: Point-to-Site, Site-to-Site, or Express Route. These offerings differ based on bandwidth, protocols supported, routing, connection resiliency, and SLAs. Pricing can vary greatly based upon your gateway type.

While VNets and VPN Gateways are probably the most common networking resources in Azure, there are many other networking services and related design decisions to consider as you plan an Azure deployment.

Data Gateways

Your BI solution may be entirely in Azure, but if you need to retrieve data from data sources in a private network (on premises or on a VNet), you’ll need a gateway. If you are using Azure Data Factory, you’ll need a Self-hosted Integration Runtime (IR). If the source for your Power BI or Azure Analysis Services model is on a private network, you’ll need an On-Premises Data Gateway. You can use the same gateway to connect to Analysis Services, Power BI, Logic Apps, Power Apps, and Flow. If you will have a mix of Analysis Services and Power BI models sharing the same gateway, you’ll need to make sure that your Power BI region and your AAS region match.

These gateways require that you have a machine inside the private network on which to install them. And if you want to scale out or have failover capabilities, you may need multiple gateways and multiple VMs. So while you may be building a solution in Azure, you might end up with a few on-premises VMs to be able to securely move source data.

Dev and Test Environments

Our nice and tidy diagram above is only showing production. We also need at least a development environment and maybe one or more test environments. You’ll need to decide how to design your dev/test environments. You may want duplicate resources in a separate resource group; e.g. a dev resource group that contains a dev Data Factory, a dev Azure SQL DW, a dev Azure Analysis Services, etc. While separating environments by resource group is common, it’s not your only option. You will need to decide if you prefer to separate environments by resource group, subscription, directory, or some combination of the three.

ARM templates and PowerShell are very useful for deploying updates and creating new environments. Also, take a look at Azure Blueprints.

You’ll also want to investigate ways to keep the costs of non-prod environments down via smaller-sized resources or pausing or deleting resources where applicable.

Plan For, Don’t Worry About, the Extras

There are several other ancillary Azure services that could/should be part of your solution.

  • For source control, GitHub and Azure Repos have the easiest integration, especially with Azure Data Factory. You’ll not only want source control for things like database projects and Data Factory pipelines, but also possibly for ARM templates and PowerShell scripts used to deploy resources (think: infrastructure as code).
  • You’ll want to set up Azure Monitoring, including alerts to let you know when processes and services are not running as intended.
  • If you want more cost management support than just setting a spending limit on a subscription (if it is available for your subscription type), it may be helpful to set budgets in Azure so you can be notified when you hit certain percentages or amounts.

Be sure to think through the entire data/solution management lifecycle. You may want to include extra services for cataloging, governing, and archiving your data.

This may sound like a complex list, but these resources and services are fairly easy to work with. Azure Active Directory has a user-friendly GUI in the portal, and commands can be automated with PowerShell, requiring relatively little effort to manage users and groups to support your BI environment. VNets and VPN Gateways are a little more complex, but there are step-by-step instructions available for many tasks in the Microsoft Docs. The Power BI Gateway and ADF IR have quick and easy GUI installers that just take a few minutes. You can automate Azure deployments with Azure Pipelines or PowerShell scripts.

None of these things are really that awful to implement or manage in Azure, unless you weren’t aware of them and your project is paused until you can get them set up.

Is there anything else you find is commonly left out when planning for BI solutions in Azure? Leave me a comment if you would like to add to the list.

Update (1/26/19): Helpful readers have commented on other aspects of your Azure BI architecture they felt were often overlooked:

  • Make sure you have a plan for how to process your Azure Analysis Services model. Will you use Azure Automation? Call an Azure Function from Data Factory?
  • Be sure to organize and tag your resources appropriately to help you understand and control costs.
  • Don’t forget Azure Key Vault. This will help you keep keys and passwords secure.

(Thanks to Chad Toney, Shannon Holck, and Santiago Cepas for these suggestions.)

Accessibility, Data Visualization, Microsoft Technologies, Power BI

Tab Order Enhances Power BI Report Accessibility

Confused starLike a diamond in the sky.
How I wonder what you are!
Twinkle, twinkle, little star,
Twinkle, twinkle, little star,
Up above the world so high,
How I wonder what you are!

Confused?

You were probably expecting a different order. Order is an important element when singing a song or telling a story or explaining information.

In western cultures we tend to read left to right, top to bottom, making a Z-pattern. This applies to books and blogs as well as reports and data visualizations. But if you are using the keyboard to navigate in a Power BI report, the order in which you arrive at visuals will not follow your vision unless you set the new tab order property. If you have low or no vision, this becomes an even bigger issue because you may not be able to see that you are navigating visuals out of visual order because the screen reader just reads whatever comes next.

It takes effort to consume each visual, and many visuals need the context of the other visuals around them to be most useful. When we present information out of order, we are putting more cognitive load on our users, forcing them to hold information in their limited working memory until they arrive at another visual that helps put the pieces together to make sense.

What is Tab Order?

Tab order is the order in which users interact with the items on a page using the keyboard. Generally, we want tab order to be predictable and to closely match the visual order on the page (unless there is a good reason to deviate). If you press the tab key in a Power BI report, you might be surprised at the seemingly random order in which you move from visual to visual on the page. The default order is the order in which the visuals were placed on the page in Power BI Desktop, or the last modified order in PowerBI.com if you have edited your report there.

I wrote about the issues with tab order in Power BI back in February and posted an idea for it. So I’m quite happy to see it come to fruition. Not only does it increase usability and accessibility, it also helps meet WCAG Success Criterion 2.4.3: Focus Order (accessibility compliance guidelines, for those who do not geek out on this stuff).

How To Set Tab Order In Power BI Desktop

To set the tab order of visuals on a report page in Power BI Desktop, go to the View tab, open the Selection Pane and select Tab Order at the top of the Selection Pane.

TabOrder1TabOrder2

From there, you can move visuals up and down in order, or hide them from tab order completely. This is helpful if you have decorative items on the page whose selection has no value to the user.

To change the tab order, you can either drag an item to a new position in the list, or you can select the item and click the up or down arrows above the list.

TabOrderCHange.gif

In case you missed it, slicers are now keyboard accessible. If you would like users to select values in slicers before using the other visuals on the page, make sure to put the slicers early in the tab order.

It only takes a minute to set the tab order, but it greatly increases usability for keyboard users.

Data Visualization, Microsoft Technologies, PASS Summit, Power BI

Power BI Visual Usability Checklist

At PASS Summit, I presented a session called “Do Your Data Visualizations Need a Makeover?”. In my session I explained how we often set ourselves up for failure when conducting explanatory data visualization before we ever place a visual on the page by not preparing appropriately, and I provided tips to improve. I also gave examples of visual design mistakes I see often. I polled the audience, and they shared some mistakes that they had seen often or that really bothered them. If you missed my presentation, you can watch it on PASS TV or Youtube.

As a companion to my presentation, I created the Power BI Visualization Usability Checklist. For those who are new to data visualization in Power BI, or those that want to employ some type of quality check, I think this is a good place to start. I occasionally do data viz makeover engagements to help people create a report that is more engaging and more widely adopted. This list draws from that experience as well as the tweaks I find myself making to my own Power BI reports. And now I have added a few things that my PASS Summit audience mentioned – thanks to those who shared their suggestions and experiences!

I’m not here to tell you to always use a certain color theme or font, or that everything should be a bar chart. Data visualization is situational and dependent upon your intended audience. I hope I can encourage you to consider your audience, how they take in information, and what information they are looking for.

This checklist provides guidelines to help make sure your report communicates your intended message in a way that works for your intended audience. It has two pages. The Data Viz Usability Checklist page contains the main checklist for you to use while building or reviewing a Power BI report.

Screenshot of Power BI Visualization Usability Checklist

The Data Viz Usability Concepts page gives you quick definitions and links for further reading about the underlying design concepts that inform my list.

Screenshot of Power BI Visualization Usability Concepts

Download the checklist here. I also have a checklist for accessibility in Power BI reports which you can find here.

If you have a suggestion to add to either list, please leave me a comment!

 

Microsoft Technologies, PASS Summit, Personal

Learning Better Presentation Skills (T-SQL Tuesday #108)

TSQLTuesdayThis month’s T-SQL Tuesday is hosted by Malathi Mahadevan (@SqlMal). The topic is to pick one thing I would like to learn that is not SQL Server.

I’m going to go a different direction than I think most people will. I spend a lot of time learning new technologies in Azure, but I am also focusing on learning better presentation skills and improving my use of related technologies. Yes, that often means PowerPoint. But sometimes I do presentations directly in Power BI when I am presenting data or mostly doing demos. Building presentations requires tech, design, and speaking skills. And I enjoy that mix.

I enjoy presenting at user groups and conferences, and lately I’ve been branching out in the types of presentations I give. At PASS Summit this year, I delivered a pre-con and a general session, and I participated in a panel and the BI Power Hour.  Each one required a slightly different presentation style.

Just like we may cringe when we go back and look at old code and wonder what we were thinking, I have the same reaction when I go back and look at old presentations.

As a data viz person, you would think I would be better at building engaging presentations since a lot of the data viz concepts apply to visual presentation, but it’s still a struggle and a constant endeavor to improve.  I have made some progress to date. Below is a sample from a presentation on good report design practices in SSRS that I gave in 2015.

Old Presentation

While it’s not the worst slide I’ve ever seen, it’s definitely not the best. Here are a few slides from my PASS Summit presentation called “Do Your Data Visualizations Need A Makeover?”, which cover the same topic as the above slide.

NewPresentation1NewPresentation2NewPresentation3NewPresentation4

The new slides are much more pleasant and engaging. I have changed to this style based upon what I have learned so far from Echo Rivera’s blog and her Countdown to Stellar Slides program. I use more and larger images and less text on each slide. This naturally leads to having more slides, but I spend less time on each one. And there is less of a chance that I will revert to just reading from a slide since there is just less text. I get to tell you about what that one sentence means to me.

My goals is to learn how to deliver presentations that are more accessible and more engaging.

I blogged about accessible slide templates earlier this year. I got interested in accessibility when I was learning how to make more accessible Power BI reports.  I want people to feel welcome and get something out of my talks, even if they have visual, auditory, or information processing challenges. So far, what I have learned is that I can be more inclusive with a handful of small changes.

To continue my learning about presentation delivery, I plan to:

And of course, I plan to give presentations so I can try out what I learn and improve from there. I have already submitted to SQLSaturday Colorado Springs, and I’m sure I will add more presentations next year.

If you have resources that have been particularly helpful in improving your presentation delivery, please leave them in the comments.

Happy T-SQL Tuesday!

 

Azure, Azure Data Factory, Microsoft Technologies

Data Factory V2 Activity Dependencies are a Logical AND

Azure Data Factory V2 allows developers to branch and chain activities together in a pipeline. We define dependencies between activities as well as their their dependency conditions. Dependency conditions can be succeeded, failed, skipped, or completed.

This sounds similar to SSIS precedence constraints, but there are a couple of big differences.

  1. SSIS allows us to define expressions to be evaluated to determine if the next task should be executed.
  2. SSIS allows us to choose whether we handle multiple constraints as a logical AND or a logical OR. In other words, do we need all constraints to be true or just one.

ADF V2 activity dependencies are always a logical AND. While we can design control flows in ADF similar to how we might design control flows in SSIS, this is one of several differences. Let’s look at an example.

PipelineNoFail
Data Factory V2 Pipeline with no failure dependencies

The pipeline above is a fairly common pattern. In addition to the normal ADF monitoring that is available with the product, we may log additional information to a database or file. That is what is happening in the first activity, logging the start of the pipeline execution to a database table via a stored procedure.

The second activity is a Lookup that gets a list of tables that should be loaded from a source system to a data lake. The next activity is a ForEach, executing the specified child activities for each value passed along from the list returned by the lookup. In this case the child activity includes copying data from a source to a file in the data lake.

Finally, we log the end of the pipeline execution to the database table.

Activities on Failure

This is all great as long as everything works. What if we want something else to happen in the event that one of the middle two activities fail?

This is where activity dependencies come in. Let’s say I have a stored procedure that I want to run when the Lookup or ForEach activity fails. Your first instinct might be to do the below.

PipelineLogicalAnd
Data Factory V2 Pipeline with two dependencies on failure activity

The above control flow probably won’t serve you very well. The LogFailure activity will not execute unless both the Lookup activity and the ForEach activity fails. There is no way to change the dependency condition so that LogFailure executes if the Lookup OR the ForEach fails.

Instead, you have a few options:

1). Use multiple failure activities. 

PipelineWithFail
Pipeline with stored procedure executed when the Lookup or ForEach activity fails

This is probably the most straight forward but least elegant option. In this option you add one activity for each potential point of failure. The stored procedure you execute in the LogLookupFailure and LogForEachFailure activities may be the same, but you need the activities to be separate so there is only one dependency for execution.

2) Create a parent pipeline and use an execute pipeline activity. Then add a single failure dependency from a stored procedure to the execute pipeline activity. This works best if you don’t really care in which activity your original/child pipeline failed and just want to log that it failed.

ExPipelineWithFail
Execute pipeline activity with a stored procedure executed on failure

3) Use an If Condition activity and write an expression that would tell you that your previous activity failed. In my specific case I might set some activity dependencies to completed instead of success and replace the LogPipelineEnd stored procedure activity with the If Condition activity. If we choose a condition that indicates failure, our If True activity would execute the failure stored procedure and our If False activity would execute the success stored procedure.

PipelineWithIf

Think of it as a dependency, not a precedence constraint.

It’s probably better to think of activity dependencies as being different than precedence constraints. This becomes even more obvious if we look at the JSON that we would write to define this rather than using the GUI. MyActivity2 depends on MyActivity1 succeeding. If we add another dependency in MyActivity2, it would depend both on that new one and the original dependency. Each additional dependency is added on.

{
    "name": "MyPipeline",
    "properties":
    {
        "description": "pipeline description",
        "activities": [
         {
            "name": "MyActivity1",
            "type": "Copy",
            "typeProperties": {
            },
            "linkedServiceName": {
            }
        },
        {
            "name": "MyActivity2",
            "type": "Copy",
            "typeProperties": {
            },
            "linkedServiceName": {
            },
            "dependsOn": [
            {
                "activity": "MyActivity1",
                "dependencyConditions": [
                    "Succeeded"
                ]
            }
          ]
        }
      ],
      "parameters": {
       }
    }
}

Do you have another way of handling this in Data Factory V2? Let me know in the comments.

If you would like to see Data Factory V2 change to let you choose how to handle multiple dependencies, you can vote for this idea on the Azure feedback site or log your own idea to suggest a different enhancement to better handle this in ADF V2.

Conferences, Microsoft Technologies, PASS Summit, Personal

Join Me At PASS Summit 2018

The PASS Summit 2018 schedule has been published, and I’m on it twice! On Monday, November 5, I am giving a full-day pre-con with Melissa Coates on Designing Modern Data and Analytics Solutions in Azure.  We’ll have presentations, hand-on labs, and open discussions about architecture options in Azure when building an analytics solution. If you’ve been wondering how your architecture should change when moving from on-premises solutions to PaaS solutions, when to use SSIS versus ADF V2, options for data virtualization, or what kind of data storage technology to use, we would love to have you attend our pre-con.

I also have a general session at PASS Summit: Do Your Data Visualizations Need A Makeover?. I’ll share the signs that your data visualizations aren’t providing a good experience for your users, explain the most common reasons why, and give you tips on how to fix it. Data visualization is a skill that must be learned and that we all should continue to sharpen.  We’ll have some fun discussing common mistakes and looking at examples. This session is scheduled for Wednesday, November 7 at 4:45pm.

If you are on the fence about attending PASS Summit, I highly recommend it, especially if you have never been. There are so many benefits for data professionals:

The content:

  • There are opportunities to learn from some of the top Microsoft Data Platform experts in the world.
  • Microsoft product and customer advisory teams have a large presence at the conference, so you can ask them questions and get advice.
  • The wide array of content allows you to go deeper on topics with which you are already familiar or to get an intro to a topic that is adjacent to your current knowledge that just wasn’t clicking for you by reading blog posts or books.

The networking:

  • You get to meet data professionals from all over the world. You can make new professional contacts and friends with whom you can keep in touch afterwards.
  • If you are looking for a new job, it’s a great place to make connections.
  • You can talk to the speakers whose blogs you read and conference sessions you attend! If you spot your favorite speaker at PASS Summit, it is a great place to introduce yourself or ask a question.

The fun:

  • There are lots of community events, including happy hours, game nights, and more.
  • There is always something to do for dinner, between receptions, sponsor parties, and friendly groups to tag along with.
  • SQL Karaoke is happening somewhere pretty much every night.

These benefits are most definitely real, not just over-hyped advertising. I have friends and colleagues of many years that I first met at PASS Summit. I first met Melissa Coates at PASS Summit, and now I work with her and present with her. And I got to help edit the Power BI whitepaper she wrote with Chris Webb (whom I also first met at PASS Summit – I fan-girled a little and asked for his autograph on his Power Query book the year it came out).  I got job interviews after letting colleagues at PASS Summit know that I was looking one year. I had a blast singing karaoke with a live band at an evening event last year. I could continue this list for quite a while, but I think you get the picture.

If you will be attending PASS Summit for the first time, check out the attendee orientation from Denny Cherry on October 2nd as well as the buddy program and speed networking event at the conference.

If you haven’t registered for PASS Summit yet but are planning on it, check with your local SQL/Power BI user group or a PASS virtual chapter for discount codes.

I hope to see you there.