I’m honored to have one of my PASS Summit sessions chosen to be part of the PASS Data Expert Series on February 7. PASS has curated the top-rated, most impactful sessions from PASS Summit 2018 for a day of solutions and best practices to help keep you at the top of your field. There are three tracks: Analytics, Data Management, and Architecture. My session is in the Analytics track along with some other great sessions from Alberto Ferrari, Jen Underwood, Carlos Bossy, Matt How, and Richard Campbell.
The video for my session, titled “Do Your Data Visualizations Need a Makeover?”, starts at 16:00 UTC (9 AM MT). I’ll be online in the webinar for live Q&A and chat related to the session.
I hope you’ll register and chat with me about data visualizations in need of a makeover on February 7.
I started this blog in 2013 during my first consulting job. In 2014, I joined BlueGranite, which is where I have been for the last 4 years. The knowledge gained while working with them has been the inspiration for a lot of the content on this blog. It has truly been a pleasure to work at BlueGranite. Work is fun when your leadership team is trustworthy and transparent and open to feedback, and your coworkers are smart and motivated and want to help you learn. I can’t express how much I appreciate the professional relationships and friendships formed during my time with them.
But an interesting opportunity presented itself to me, and I have decided to take it.
Today is my first day working for Denny Cherry & Associates! I’m excited to be working with Denny (B|T), Joey (B|T), Kerry (B|T), Monica (B|T), John (B|T), and Peter (T). I’ve known the consultants at Denny Cherry & Associates for years through the SQL Community. My chats with them at conferences and on twitter demonstrated their incredible SQL and Azure knowledge, and I can’t wait to learn from them. I might even teach them a thing or two about BI/Analytics. I feel very fortunate to work for a company that values and encourages participation in the technical community. There are five Microsoft MVPs (including me), and all 6 consultants are speakers and bloggers. DCAC does a healthy mix of implementations, migrations, health checks, support, and training. They have completed many interesting and impactful Azure projects, which was a big draw for me.
Otherwise, things should be business as usual for me: work from home with the bulldog and a little travel as needed. I’ll continue to blog here in my free time with new coworkers and projects to serve as my inspiration.
When we talk about Azure architectures for data warehousing or analytics, we usually show a diagram that looks like the below.
This diagram is a great start to explain what services will be used in Azure to build out a solution or platform. But many times, we add the specific resource names and stop there. If you have built several projects in Azure, you will know there some other things for which you will need to plan. So what’s missing?
Azure Active Directory
Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based identity and access management service. Members of an organization have a user account that can sign in to various services. AAD is used to access Office 365, Power BI, and Dynamics 365, as well as the Azure portal. It can also be used to grant access and permissions to specific Azure resources.
For instance, users who will participate in Data Factory pipeline development must be assigned to the Data Factory Contributor role in Azure. Users can authenticate via AAD to log in to Azure SQL DW. You can use AD Groups to grant permissions to users interacting with various Azure resources such as Azure SQL DW or SQL DB as well as Azure Analysis Services and Power BI. It’s more efficient to manage permissions for groups than to do it for each user. You’ll need a plan to manage group membership.
You may also need to register applications in Azure Active Directory to allow Azure services to authenticate and interact with each other. While the guidance for many services is now to use managed identities, this may not be available for every situation just yet.
Virtual Networks (or VNets) allow many types of Azure
resources to securely communicate with each other, the internet, and
on-premises networks. You can have multiple virtual networks in an Azure
subscription. Each virtual network is isolated from other VNets, unless you set
up VNet peering.
Some PaaS services such as Azure Storage Accounts, Azure SQL DW, and Azure Analysis Services support Virtual Network Service Endpoints. A common usage scenario is to set up VNets and VNet Service Endpoints to connect resources to on-premises networks. Some organizations prefer to use VNet Service Endpoints instead of public service endpoints, making it so that traffic can only access the resource from within the organization’s local network.
In order to connect a VNet to an on-premises network or another VNet (outside of peering), you’ll need a VPN Gateway. You’ll need to identify the most appropriate type of VPN Gateway: Point-to-Site, Site-to-Site, or Express Route. These offerings differ based on bandwidth, protocols supported, routing, connection resiliency, and SLAs. Pricing can vary greatly based upon your gateway type.
While VNets and VPN Gateways are probably the most common
networking resources in Azure, there are many
other networking services and related design decisions to consider as you
plan an Azure deployment.
Your BI solution may be entirely in Azure, but if you need to retrieve data from data sources in a private network (on premises or on a VNet), you’ll need a gateway. If you are using Azure Data Factory, you’ll need a Self-hosted Integration Runtime (IR). If the source for your Power BI or Azure Analysis Services model is on a private network, you’ll need an On-Premises Data Gateway. You can use the same gateway to connect to Analysis Services, Power BI, Logic Apps, Power Apps, and Flow. If you will have a mix of Analysis Services and Power BI models sharing the same gateway, you’ll need to make sure that your Power BI region and your AAS region match.
These gateways require that you have a machine inside the private network on which to install them. And if you want to scale out or have failover capabilities, you may need multiple gateways and multiple VMs. So while you may be building a solution in Azure, you might end up with a few on-premises VMs to be able to securely move source data.
Dev and Test Environments
Our nice and tidy
diagram above is only showing production. We also need at least a development
environment and maybe one or more test environments. You’ll need to decide how
to design your dev/test environments. You may want duplicate resources in a
separate resource group; e.g. a dev resource group that contains a dev Data Factory,
a dev Azure SQL DW, a dev Azure Analysis Services, etc. While separating
environments by resource group is common, it’s not your only option. You will
need to decide if you prefer to separate environments by resource group,
subscription, directory, or some combination of the three.
You’ll also want to investigate
ways to keep the costs of non-prod environments down via smaller-sized
resources or pausing or deleting resources where applicable.
Plan For, Don’t Worry About, the Extras
There are several other ancillary Azure services that could/should
be part of your solution.
For source control, GitHub and Azure Repos have the easiest integration, especially with Azure Data Factory. You’ll not only want source control for things like database projects and Data Factory pipelines, but also possibly for ARM templates and PowerShell scripts used to deploy resources (think: infrastructure as code).
You’ll want to set up Azure Monitoring, including alerts to let you know when processes and services are not running as intended.
If you want more cost management support than just setting a spending limit on a subscription (if it is available for your subscription type), it may be helpful to set budgets in Azure so you can be notified when you hit certain percentages or amounts.
Be sure to think through the entire data/solution management
lifecycle. You may want to include extra services for cataloging, governing,
and archiving your data.
This may sound like a complex list, but these resources and services are fairly easy to work with. Azure Active Directory has a user-friendly GUI in the portal, and commands can be automated with PowerShell, requiring relatively little effort to manage users and groups to support your BI environment. VNets and VPN Gateways are a little more complex, but there are step-by-step instructions available for many tasks in the Microsoft Docs. The Power BI Gateway and ADF IR have quick and easy GUI installers that just take a few minutes. You can automate Azure deployments with Azure Pipelines or PowerShell scripts.
None of these things
are really that awful to implement or manage in Azure, unless you weren’t aware
of them and your project is paused until you can get them set up.
Is there anything else you find is commonly left out when planning for BI solutions in Azure? Leave me a comment if you would like to add to the list.
Update (1/26/19): Helpful readers have commented on other aspects of your Azure BI architecture they felt were often overlooked:
Make sure you have a plan for how to process your Azure Analysis Services model. Will you use Azure Automation? Call an Azure Function from Data Factory?
Be sure to organize and tag your resources appropriately to help you understand and control costs.
Don’t forget Azure Key Vault. This will help you keep keys and passwords secure.
(Thanks to Chad Toney, Shannon Holck, and Santiago Cepas for these suggestions.)
Like a diamond in the sky. How I wonder what you are! Twinkle, twinkle, little star, Twinkle, twinkle, little star, Up above the world so high, How I wonder what you are!
You were probably expecting a different order. Order is an important element when singing a song or telling a story or explaining information.
In western cultures we tend to read left to right, top to bottom, making a Z-pattern. This applies to books and blogs as well as reports and data visualizations. But if you are using the keyboard to navigate in a Power BI report, the order in which you arrive at visuals will not follow your vision unless you set the new tab order property. If you have low or no vision, this becomes an even bigger issue because you may not be able to see that you are navigating visuals out of visual order because the screen reader just reads whatever comes next.
It takes effort to consume each visual, and many visuals need the context of the other visuals around them to be most useful. When we present information out of order, we are putting more cognitive load on our users, forcing them to hold information in their limited working memory until they arrive at another visual that helps put the pieces together to make sense.
What is Tab Order?
Tab order is the order in which users interact with the items on a page using the keyboard. Generally, we want tab order to be predictable and to closely match the visual order on the page (unless there is a good reason to deviate). If you press the tab key in a Power BI report, you might be surprised at the seemingly random order in which you move from visual to visual on the page. The default order is the order in which the visuals were placed on the page in Power BI Desktop, or the last modified order in PowerBI.com if you have edited your report there.
To set the tab order of visuals on a report page in Power BI Desktop, go to the View tab, open the Selection Pane and select Tab Order at the top of the Selection Pane.
From there, you can move visuals up and down in order, or hide them from tab order completely. This is helpful if you have decorative items on the page whose selection has no value to the user.
To change the tab order, you can either drag an item to a new position in the list, or you can select the item and click the up or down arrows above the list.
In case you missed it, slicers are now keyboard accessible. If you would like users to select values in slicers before using the other visuals on the page, make sure to put the slicers early in the tab order.
It only takes a minute to set the tab order, but it greatly increases usability for keyboard users.
At PASS Summit, I presented a session called “Do Your Data Visualizations Need a Makeover?”. In my session I explained how we often set ourselves up for failure when conducting explanatory data visualization before we ever place a visual on the page by not preparing appropriately, and I provided tips to improve. I also gave examples of visual design mistakes I see often. I polled the audience, and they shared some mistakes that they had seen often or that really bothered them. If you missed my presentation, you can watch it on PASS TV or Youtube.
As a companion to my presentation, I created the Power BI Visualization Usability Checklist. For those who are new to data visualization in Power BI, or those that want to employ some type of quality check, I think this is a good place to start. I occasionally do data viz makeover engagements to help people create a report that is more engaging and more widely adopted. This list draws from that experience as well as the tweaks I find myself making to my own Power BI reports. And now I have added a few things that my PASS Summit audience mentioned – thanks to those who shared their suggestions and experiences!
I’m not here to tell you to always use a certain color theme or font, or that everything should be a bar chart. Data visualization is situational and dependent upon your intended audience. I hope I can encourage you to consider your audience, how they take in information, and what information they are looking for.
This checklist provides guidelines to help make sure your report communicates your intended message in a way that works for your intended audience. It has two pages. The Data Viz Usability Checklist page contains the main checklist for you to use while building or reviewing a Power BI report.
The Data Viz Usability Concepts page gives you quick definitions and links for further reading about the underlying design concepts that inform my list.
Download the checklist here. I also have a checklist for accessibility in Power BI reports which you can find here.
If you have a suggestion to add to either list, please leave me a comment!
I’m going to go a different direction than I think most people will. I spend a lot of time learning new technologies in Azure, but I am also focusing on learning better presentation skills and improving my use of related technologies. Yes, that often means PowerPoint. But sometimes I do presentations directly in Power BI when I am presenting data or mostly doing demos. Building presentations requires tech, design, and speaking skills. And I enjoy that mix.
I enjoy presenting at user groups and conferences, and lately I’ve been branching out in the types of presentations I give. At PASS Summit this year, I delivered a pre-con and a general session, and I participated in a panel and the BI Power Hour. Each one required a slightly different presentation style.
Just like we may cringe when we go back and look at old code and wonder what we were thinking, I have the same reaction when I go back and look at old presentations.
As a data viz person, you would think I would be better at building engaging presentations since a lot of the data viz concepts apply to visual presentation, but it’s still a struggle and a constant endeavor to improve. I have made some progress to date. Below is a sample from a presentation on good report design practices in SSRS that I gave in 2015.
While it’s not the worst slide I’ve ever seen, it’s definitely not the best. Here are a few slides from my PASS Summit presentation called “Do Your Data Visualizations Need A Makeover?”, which cover the same topic as the above slide.
The new slides are much more pleasant and engaging. I have changed to this style based upon what I have learned so far from Echo Rivera’s blog and her Countdown to Stellar Slides program. I use more and larger images and less text on each slide. This naturally leads to having more slides, but I spend less time on each one. And there is less of a chance that I will revert to just reading from a slide since there is just less text. I get to tell you about what that one sentence means to me.
My goals is to learn how to deliver presentations that are more accessible and more engaging.
I blogged about accessible slide templates earlier this year. I got interested in accessibility when I was learning how to make more accessible Power BI reports. I want people to feel welcome and get something out of my talks, even if they have visual, auditory, or information processing challenges. So far, what I have learned is that I can be more inclusive with a handful of small changes.
To continue my learning about presentation delivery, I plan to:
And of course, I plan to give presentations so I can try out what I learn and improve from there. I have already submitted to SQLSaturday Colorado Springs, and I’m sure I will add more presentations next year.
If you have resources that have been particularly helpful in improving your presentation delivery, please leave them in the comments.
Azure Data Factory V2 allows developers to branch and chain activities together in a pipeline. We define dependencies between activities as well as their their dependency conditions. Dependency conditions can be succeeded, failed, skipped, or completed.
SSIS allows us to define expressions to be evaluated to determine if the next task should be executed.
SSIS allows us to choose whether we handle multiple constraints as a logical AND or a logical OR. In other words, do we need all constraints to be true or just one.
ADF V2 activity dependencies are always a logical AND. While we can design control flows in ADF similar to how we might design control flows in SSIS, this is one of several differences. Let’s look at an example.
The pipeline above is a fairly common pattern. In addition to the normal ADF monitoring that is available with the product, we may log additional information to a database or file. That is what is happening in the first activity, logging the start of the pipeline execution to a database table via a stored procedure.
The second activity is a Lookup that gets a list of tables that should be loaded from a source system to a data lake. The next activity is a ForEach, executing the specified child activities for each value passed along from the list returned by the lookup. In this case the child activity includes copying data from a source to a file in the data lake.
Finally, we log the end of the pipeline execution to the database table.
Activities on Failure
This is all great as long as everything works. What if we want something else to happen in the event that one of the middle two activities fail?
This is where activity dependencies come in. Let’s say I have a stored procedure that I want to run when the Lookup or ForEach activity fails. Your first instinct might be to do the below.
The above control flow probably won’t serve you very well. The LogFailure activity will not execute unless both the Lookup activity and the ForEach activity fails. There is no way to change the dependency condition so that LogFailure executes if the Lookup OR the ForEach fails.
Instead, you have a few options:
1). Use multiple failure activities.
This is probably the most straight forward but least elegant option. In this option you add one activity for each potential point of failure. The stored procedure you execute in the LogLookupFailure and LogForEachFailure activities may be the same, but you need the activities to be separate so there is only one dependency for execution.
2) Create a parent pipeline and use an execute pipeline activity. Then add a single failure dependency from a stored procedure to the execute pipeline activity. This works best if you don’t really care in which activity your original/child pipeline failed and just want to log that it failed.
3) Use an If Condition activity and write an expression that would tell you that your previous activity failed. In my specific case I might set some activity dependencies to completed instead of success and replace the LogPipelineEnd stored procedure activity with the If Condition activity. If we choose a condition that indicates failure, our If True activity would execute the failure stored procedure and our If False activity would execute the success stored procedure.
Think of it as a dependency, not a precedence constraint.
It’s probably better to think of activity dependencies as being different than precedence constraints. This becomes even more obvious if we look at the JSON that we would write to define this rather than using the GUI. MyActivity2 depends on MyActivity1 succeeding. If we add another dependency in MyActivity2, it would depend both on that new one and the original dependency. Each additional dependency is added on.
Do you have another way of handling this in Data Factory V2? Let me know in the comments.
If you would like to see Data Factory V2 change to let you choose how to handle multiple dependencies, you can vote for this idea on the Azure feedback site or log your own idea to suggest a different enhancement to better handle this in ADF V2.