Azure, Azure Data Factory, Microsoft Technologies

Data Factory V2 Activity Dependencies are a Logical AND

Azure Data Factory V2 allows developers to branch and chain activities together in a pipeline. We define dependencies between activities as well as their their dependency conditions. Dependency conditions can be succeeded, failed, skipped, or completed.

This sounds similar to SSIS precedence constraints, but there are a couple of big differences.

  1. SSIS allows us to define expressions to be evaluated to determine if the next task should be executed.
  2. SSIS allows us to choose whether we handle multiple constraints as a logical AND or a logical OR. In other words, do we need all constraints to be true or just one.

ADF V2 activity dependencies are always a logical AND. While we can design control flows in ADF similar to how we might design control flows in SSIS, this is one of several differences. Let’s look at an example.

PipelineNoFail
Data Factory V2 Pipeline with no failure dependencies

The pipeline above is a fairly common pattern. In addition to the normal ADF monitoring that is available with the product, we may log additional information to a database or file. That is what is happening in the first activity, logging the start of the pipeline execution to a database table via a stored procedure.

The second activity is a Lookup that gets a list of tables that should be loaded from a source system to a data lake. The next activity is a ForEach, executing the specified child activities for each value passed along from the list returned by the lookup. In this case the child activity includes copying data from a source to a file in the data lake.

Finally, we log the end of the pipeline execution to the database table.

Activities on Failure

This is all great as long as everything works. What if we want something else to happen in the event that one of the middle two activities fail?

This is where activity dependencies come in. Let’s say I have a stored procedure that I want to run when the Lookup or ForEach activity fails. Your first instinct might be to do the below.

PipelineLogicalAnd
Data Factory V2 Pipeline with two dependencies on failure activity

The above control flow probably won’t serve you very well. The LogFailure activity will not execute unless both the Lookup activity and the ForEach activity fails. There is no way to change the dependency condition so that LogFailure executes if the Lookup OR the ForEach fails.

Instead, you have a few options:

1). Use multiple failure activities. 

PipelineWithFail
Pipeline with stored procedure executed when the Lookup or ForEach activity fails

This is probably the most straight forward but least elegant option. In this option you add one activity for each potential point of failure. The stored procedure you execute in the LogLookupFailure and LogForEachFailure activities may be the same, but you need the activities to be separate so there is only one dependency for execution.

2) Create a parent pipeline and use an execute pipeline activity. Then add a single failure dependency from a stored procedure to the execute pipeline activity. This works best if you don’t really care in which activity your original/child pipeline failed and just want to log that it failed.

ExPipelineWithFail
Execute pipeline activity with a stored procedure executed on failure

3) Use an If Condition activity and write an expression that would tell you that your previous activity failed. In my specific case I might set some activity dependencies to completed instead of success and replace the LogPipelineEnd stored procedure activity with the If Condition activity. If we choose a condition that indicates failure, our If True activity would execute the failure stored procedure and our If False activity would execute the success stored procedure.

PipelineWithIf

Think of it as a dependency, not a precedence constraint.

It’s probably better to think of activity dependencies as being different than precedence constraints. This becomes even more obvious if we look at the JSON that we would write to define this rather than using the GUI. MyActivity2 depends on MyActivity1 succeeding. If we add another dependency in MyActivity2, it would depend both on that new one and the original dependency. Each additional dependency is added on.

{
    "name": "MyPipeline",
    "properties":
    {
        "description": "pipeline description",
        "activities": [
         {
            "name": "MyActivity1",
            "type": "Copy",
            "typeProperties": {
            },
            "linkedServiceName": {
            }
        },
        {
            "name": "MyActivity2",
            "type": "Copy",
            "typeProperties": {
            },
            "linkedServiceName": {
            },
            "dependsOn": [
            {
                "activity": "MyActivity1",
                "dependencyConditions": [
                    "Succeeded"
                ]
            }
          ]
        }
      ],
      "parameters": {
       }
    }
}

Do you have another way of handling this in Data Factory V2? Let me know in the comments.

If you would like to see Data Factory V2 change to let you choose how to handle multiple dependencies, you can vote for this idea on the Azure feedback site or log your own idea to suggest a different enhancement to better handle this in ADF V2.

Conferences, Microsoft Technologies, PASS Summit, Personal

Join Me At PASS Summit 2018

The PASS Summit 2018 schedule has been published, and I’m on it twice! On Monday, November 5, I am giving a full-day pre-con with Melissa Coates on Designing Modern Data and Analytics Solutions in Azure.  We’ll have presentations, hand-on labs, and open discussions about architecture options in Azure when building an analytics solution. If you’ve been wondering how your architecture should change when moving from on-premises solutions to PaaS solutions, when to use SSIS versus ADF V2, options for data virtualization, or what kind of data storage technology to use, we would love to have you attend our pre-con.

I also have a general session at PASS Summit: Do Your Data Visualizations Need A Makeover?. I’ll share the signs that your data visualizations aren’t providing a good experience for your users, explain the most common reasons why, and give you tips on how to fix it. Data visualization is a skill that must be learned and that we all should continue to sharpen.  We’ll have some fun discussing common mistakes and looking at examples. This session is scheduled for Wednesday, November 7 at 4:45pm.

If you are on the fence about attending PASS Summit, I highly recommend it, especially if you have never been. There are so many benefits for data professionals:

The content:

  • There are opportunities to learn from some of the top Microsoft Data Platform experts in the world.
  • Microsoft product and customer advisory teams have a large presence at the conference, so you can ask them questions and get advice.
  • The wide array of content allows you to go deeper on topics with which you are already familiar or to get an intro to a topic that is adjacent to your current knowledge that just wasn’t clicking for you by reading blog posts or books.

The networking:

  • You get to meet data professionals from all over the world. You can make new professional contacts and friends with whom you can keep in touch afterwards.
  • If you are looking for a new job, it’s a great place to make connections.
  • You can talk to the speakers whose blogs you read and conference sessions you attend! If you spot your favorite speaker at PASS Summit, it is a great place to introduce yourself or ask a question.

The fun:

  • There are lots of community events, including happy hours, game nights, and more.
  • There is always something to do for dinner, between receptions, sponsor parties, and friendly groups to tag along with.
  • SQL Karaoke is happening somewhere pretty much every night.

These benefits are most definitely real, not just over-hyped advertising. I have friends and colleagues of many years that I first met at PASS Summit. I first met Melissa Coates at PASS Summit, and now I work with her and present with her. And I got to help edit the Power BI whitepaper she wrote with Chris Webb (whom I also first met at PASS Summit – I fan-girled a little and asked for his autograph on his Power Query book the year it came out).  I got job interviews after letting colleagues at PASS Summit know that I was looking one year. I had a blast singing karaoke with a live band at an evening event last year. I could continue this list for quite a while, but I think you get the picture.

If you will be attending PASS Summit for the first time, check out the attendee orientation from Denny Cherry on October 2nd as well as the buddy program and speed networking event at the conference.

If you haven’t registered for PASS Summit yet but are planning on it, check with your local SQL/Power BI user group or a PASS virtual chapter for discount codes.

I hope to see you there.

Data Visualization, Microsoft Technologies, Power BI

Considerations for Using Layout Images in Power BI

Using layout images in Power BI has become a popular design trend. When I say layout images, I’m referring to background images with shapes around areas where visuals are placed. This is different from the new wallpaper feature that became available in the July release, which can be used to format the grey area outside your report page and extend the main color of background images.

Layout images can help with spacing and alignment within a report and can help create consistency across reports. They can also help create affordances, using consistent layout and design to make it obvious how users should interact with our reports.

I use layout images in some of my reports, but I don’t think they are necessary on every report. There are a couple of things to consider when using layout images.

  1. Don’t let your layout image take the focus away from the data. This can happen due to lack of color contrast or because the color(s) used in the layout image are much more intense than the visuals on your report page.
  2. While we may strive for consistency in report design, especially in larger organizations, we can’t let a layout keep us from creating the most effective visual to communicate the information in our data. If we start with a layout and limit ourselves to only visuals that fit that layout, that constraint may prevent us from creating a better report. If you have identified the chart type you need to communicate your information, but it doesn’t fit in your 3-column layout because the visual needs to be a bit wider, get a new template that accommodates your visual. I would rather see slightly different templates than ineffective chart types or the right chart types but the visuals squished into a space where they don’t fit (hard to read, truncated labels, etc.).

You can make your own layout images or choose one that someone else has created. PowerBI.Tips offers Power BI layouts in the form of Power BI Templates (.PBIT files) with background images set on the report pages where you replace the data source and repopulate the visuals. The templates also contain report themes.

Frederik Hedenstrom has a grid generator where you can set the width, height, columns, spacing, and colors. Then you can download your image and set it as your page background.

When I make my own layouts, I typically just use PowerPoint. Usually, the layout is the last thing I add rather than the first, but you can do what works best for you. This is the process I use to make layout images:

  1. Take a screenshot of my report page and paste it onto a blank PowerPoint slide.
  2.  Draw shapes (usually rectangles or rectangles with rounded corners) over the screenshot where the visuals are placed.
  3. Delete the screenshot from the slide.
  4. Format the slide background and the shapes (alignment, colors, outline, shadow effects, etc.).
  5. Export the slide as an image.
  6. Import the image as the page background. Adjust the image fit and transparency as needed.

Doing it myself is a bit of work, but it means I get exactly the effect I want. And I’ve gotten pretty quick at it now that I have done it several times. But again, you can take shortcuts using the resources mentioned above. Once I have my background layout image set, I check that it isn’t too distracting by asking myself these questions:

  • Is the background more intense than the visuals to where I look at it before I look at the data visualizations?
  • Do my visuals no longer stand out because the background color is too similar to the colors in my visuals?
  • Can I still clearly read my charts, including all titles and labels now that the background image is in place?

To help illustrate my points, take a look at this example report I have been working on.

PBINoLayoutImage
Version 1: No Layout Image
PBILayoutImage
Version 2: Layout Image Applied
PBILayoutImageTooDark
Version 3: Layout Image Too Distracting

In Version 1, there is no layout image. Some might think the report looks a tad bare. While there is a layout of 3 rows, it’s not immediately obvious.

In version 2, I have applied a layout image. It is subtle, using a light gray background color and a soft shadow around the shapes. It emphasizes the 3 rows, which makes sense in this report. In the top row I have the title and some summary numbers. In the middle row, I’m slicing number of reviews and % recommended by high-level categories. In the bottom row I have one visual that slices average and median ratings by a more detailed category.

In version 3, I changed the background to a dark gray/light black.  With that background, the dark color is the thing in the report that stands out most to me, but it provides no information and doesn’t enhance the user experience more than the subtle light version of the layout image.

Final Thoughts

Layout images can be useful. You can save time by using images created by others, but don’t let a layout needlessly constrain your data visualization or distract from the information in your report.

 

Accessibility, Data Visualization, Microsoft Technologies, Power BI

Power BI Report Accessibility Checklist

In many cases, some small changes can go a long way in making your Power BI reports more accessible for users with different abilities. The checklist below lists considerations you should make in your report design to create more inclusive reports. I’ll update this post as new features are released.

Accessibility Checklist

All Visuals

  • Ensure color contrast between title, axis label, and data label text and the background are at least 4.5:1.
  • Avoid using color as the only means of conveying information. Use text or icons to supplement or replace the color.
  • Replace unnecessary jargon or acronyms.
  • Ensure alt text is added to all non-decorative visuals on the page.
  • Check that your report page works for users with color vision deficiency.

Slicers

  • If you have a collection of several slicers on your report pages, ensure your design is consistent across pages. Use the same font, colors, and spatial position as much as possible.

Textbox

  • Ensure color contrast between font and background are at least 4.5:1.

Visual Interactions

  • Is key information only accessible through an interaction? If so, rearrange your visuals so they are pre-filtered to make the important conclusion more obvious.
  • Are you using bookmarks for navigation? Keyboard users can’t select images to go to a hyperlink or bookmark. Try navigating your report with a keyboard to ensure the experience is acceptable for keyboard-only users.

Sort Order

  • Have you purposefully set the sort order of each visual on the page? The accessible Show Data table shows the data in the sort order you have set on the visual.

Tooltips

  • Don’t use tooltips to convey important information. Users with motor issues and users who do not use a mouse will have difficulties accessing them.
  • Do add tooltips to charts as ancillary information. It is included in the accessible Show Data table for each visual.

Video

  • Avoid video that automatically starts when the page is rendered.
  • Ensure your video has captions or provide a transcript.

Audio

  • Avoid audio that automatically starts when the page is rendered.
  • Provide a transcript for any audio.

Shapes

  • Avoid using too many decorative shapes. They are announced by the screen reader when reading the page.
  • When using shapes to call out data points, use alt text to explain what is being called out.

Images

  • When using images to call out data points, use alt text to explain what is being called out.
  • Avoid using too many decorative images. They are announced by the screen reader when reading the page.

Custom Visuals

  • Check the accessible Show Data table for custom visuals. If the information shown is not sufficient, look for another visual.
  • If using the Play Axis custom visual, ensure it does not autoplay. Make it obvious that the user must press the play/pause button to start/stop the changing values.

Power BI Accessibility Features

Tools to Check Accessibility in your Power BI Report

Keyboard Only Navigation

  • Use a keyboard to navigate and interact with your report, without using a mouse.

Color Vision Deficiency

Low Vision

  • Use a mobile device with brightness on low to test mobile reports
  • Use WebAIM or Accessible Colors to check color contrast of text vs background
  • Use The Squint Test to check that a Power BI report makes sense to someone with low vision

If you would like to suggest an update to the list, feel free to leave a comment on this post.

Data Visualization, Microsoft Technologies, Power BI

Choosing a Color Palette For Your Power BI Report

Color is a powerful attribute in data visualization. In a good visualization, it can focus attention and enhance meaning and clarity. When color is used poorly, it creates clutter and confusion. Power BI has a default color palette, but it isn’t always optimal or even appropriate for many reports. Luckily, Power BI allows you to use any color that can be defined by a hex code where visuals allow colors to be changed. With so many choices, choosing a color palette can be overwhelming. Below are some tips to help you choose a good color palette for your Power BI reports.

A color palette is simply a collection of colors applied to the visual elements in your report. What we typically refer to as color is a combination of three main properties: hue (base color on the color wheel), intensity (brightness or gray-ness) and value (lightness or darkness). You can build an engaging and professional looking report with just 6 colors. It’s possible to have fewer colors or more colors, but 6 should cover many common visualization needs. If you are using more than 6 colors, you might want to check that you are optimizing engagement and cognitive load.

  1. Main color – default color on graphs
  2. Color 2 – used when multiple colors are needed in a graph or report
  3. Color 3 – used when multiple colors are needed in a graph or report and Color 2 has already been used
  4. Highlight color – a color used to highlight important data points to make them stand out from other points on the page
  5. Border color – a light color used for borders on tables and KPIs where necessary
  6. Title color – color used for visual titles and axis labels as appropriate

Example Power BI color palette with 6 colors

While your title and border colors don’t have to be variations of gray, gray is a practical color for these purposes when using a white background. You could also use brown, blue, purple, etc. You just want to ensure that your text color has sufficient contrast from its background and that it isn’t more intense than your data colors. I tend to make my border color a tint of the title color.

This is a good place to define a few terms. A hue is a base color without black, white, or gray added, which you might find on a basic color wheel. A shade is achieved by adding black to any pure hue, making it darker. A tint is achieve by adding white to a pure hue, making it lighter. A tone is achieved by adding gray to a pure hue, making it less saturated and more muted.

I think it’s easiest to start your color palette by choosing the main color. This could be inspired by your corporate color palette or logo, your favorite color, or a color associated with the subject matter of your report. Be aware that color has cultural meaning and conveys emotion, and you want to choose a color that conveys the appropriate tone of your report. This can get tricky, so just try to make sure you don’t choose a main color that has a lot of cognitive dissonance with your subject. For instance, if you are creating a report for a U.S. audience about our gun violence problem, you probably wouldn’t use a light, happy, pastel green as your main color. But you could be fine using a bright red, dark blue, orange, black, or several other colors.

You will want to choose a main color that has a medium intensity.  If it is too bright or too dark, you won’t have any room to use a more intense version of that color to focus attention. And since you want your main, secondary, and tertiary color to be the same intensity, it would feel as if everything on the page were yelling at you if all three colors were bold. If you are starting from a corporate color palette, be aware that most brand color palettes were designed for websites and print collateral, not data visualization. They are usually too intense or bright to serve as your main data visualization colors. But you can use a tint or tone of your corporate colors so your reports stay on brand.

Once you have chosen your main color, you need to decide what type of color scheme you would like to use. Common options include:

  • monochromatic – tints and shades of a single hue
  • complementary – colors that are opposite each other on the color wheel
  • analogous – colors that are next to each other on the color wheel
  • split complementary – a base color plus two colors that are adjacent to its complement on the color wheel
  • triadic – colors that are evenly spaced on the color wheel

Example color schemes from ColorHexa.com

My current favorite tool for choosing colors is ColorHexa. It provides hex colors, color schemes (as shown above), tints, shades, and tones.

Once you have made some initial color choices, test it out on a few charts to ensure you can answer yes to the following questions:

  • Are all colors easily distinguishable from each other? If you were to use the main, secondary, and tertiary color in a line chart, could you easily follow the lines as they cross each other?
  • Is your color palette color blind friendly? You can use ColorHexa or Coblis to check this. It’s not always obvious when you have people with color vision deficiency in your intended audience, so it’s better to use colors that are easily distinguishable by those with red-green color blindness (deuteranomaly, deuteranopia, protanomaly, and protanopia).
  • Does your highlight color have high contrast from your other colors so it is obvious that it is being used to draw attention to a trend or data point?
  • If you are using a non-white background color, do your colors stand out sufficiently from your background?
  • When you look at your color choices, do you find the combination generally appealing, balanced, and not overly jarring? This is a bit subjective, but if you look at your colors and have a negative reaction then your audience will probably have a similar reaction.

Finally, be aware that colors display differently on different screens and surfaces. You can put a lot of time and effort into choosing the perfect colors, then share a report with someone and have it look rather different on their monitor or when viewed on a projector screen. If you can, review your colors on the equipment that your intended audience will most commonly use to make sure it looks good for them.

If you are still having trouble choosing colors, you can check out the Power BI Report Theme Gallery for some inspiration. Not every example in the gallery shows good color choices, but you can still use it to get ideas.

Once you have your color palette, you can reuse it in future reports by making a report theme. And if you aren’t a fan of manually writing JSON for your report theme, check out the Report Theme Generator from PowerBI.Tips. If you define your colors in a report theme, Power BI will create tints and shades of your colors for you, saving you the trouble of having to look them up yourself.

Power BI Color Picker

If you have advice to help others choose colors, leave a comment on this post or tweet me.

Azure, Microsoft Technologies, Power BI

Thoughts and Lessons Learned From A Power BI Embedded POC

I worked on a Power BI embedded POC where a report with an in-memory Power BI model as the dataset was embedded into an application in an “app owns data” scenario. This means that the application handles all authentication and access, and users do not need to be Active Directory users or have Power BI licenses. This can be a good fit when you want analysts to be able to change the reports as needed and immediately see the changes in the application

High-Level Components and Steps


Overview of Power BI Embedded in an ISV Scenario
Image from Microsoft Docs: https://docs.microsoft.com/en-us/power-bi/developer/embedding

The following items are needed for embedding Power BI content into an ISV/app owns data application:

  • Azure Active Directory tenant
  • Power BI Pro account
  • Power BI dashboard, tile, or report
  • Power BI workspace
  • Power BI embedded capacity (for testing/production)
  • An application in which to embed the Power BI content

While there is pretty good documentation for this, the steps weren’t immediately clear to me because the app owns data and user owns data scenarios are mixed and matched in some parts of the documentation from Microsoft. I found there are 8 main steps to embedding content with row-level security enabled in an app owns data scenario.

  1. Create the Azure Active Directory account to be used by the embedding application. Assign a Power BI Pro license to the account.
  2. Create an app workspace in PowerBI.com. Set the workspace to private. Set the analyst who owns the report as the workspace admin. Set the service account (created in step 1) as a workspace admin.
  3. Update the Power BI report with row-level security roles and filters. Ensure that usernames and corresponding roles are available to the application.
  4. Publish the Power BI report to the app workspace.
  5. Register the application that will show the report in Azure Active Directory.
  6. Add code to the application to get the Active Directory access token.
  7. Add JavaScript to the application to create the Power BI client, get the content item to embed, create the embed token, and load the content.
  8. Provision the appropriate Power BI embedded capacity in Azure and assign the app workspace containing the report to the embedded capacity.

There is an example project in Github for your reference, as well as a utility to help you generate your embedding code.

Thoughts And Lessons Learned

Interestingly, row-level security works just the same as it does on PowerBI.com. You do nothing different in your PBIX file. You just don’t populate the role members in PowerBI.com. Instead, your pass the effective user in your embed token.

Unlike using the Publish To Web feature, Full Screen mode is not available in an embedded report. You can, however, add a button on the page where you embedded the report that allows it to go full screen.

If users are just consuming a report, and you are using slicers to allow them to filter data rather than the filters pane, it’s nice to hide the filter pane. And it just takes a quick bit of JavaScript. But if you hide the filters pane and have charts where users might use the include/exclude functionality on specific data point, you will need to provide a way to reset the filters since the user can’t access the filters pane. This could be a bookmark on the report page or a button on the application page that uses the APIs to reset the filters.

As of March, you can hide visual headers on all visuals in a report in Reading View. This looks much cleaner and alleviates the issues that arise when menus at the top of one visual overlap the bottom of another. But this also means that users won’t be able to access menu options such as In-Focus Mode and Export Data. If these are important, you will want to leave your visual headers visible. If you have some pages where you would like users to export data and others where it isn’t important, consider splitting out the report so you can turn the visual headers on for one report and off for the other.

After making changes and testing your report, make sure to clear any slicer values before publishing, if you have row-level security on a field shown in a slicer and you leave values selected. The selected values will be shown to users when they view the report. For example, let’s say you have created a row-level security role that can only see Product A, but you can see everything, and you left Product A and Product B selected and deployed the report. A user who views the report next and is a member of that RLS role will see the two selected values in the slicer, even though they can’t see the data for Product B on the page. This may not be a big deal for an internal report. But now imagine this is for clients. You don’t want clients to see other clients in the list. This behavior is consistent in the Power BI web service and isn’t specific to embedding. It’s just important to remember this.

By default, a report will load the page that was shown when the user last saved it. This happens in PowerBI.com as well. In embedded solutions, the page of a report can be specified in the embedding code, essentially specifying a default page within the report when viewed through the application. If a user hits the refresh button on their browser while looking at the report, the report will be loaded to the default page rather than the page the user was last viewing.

My POC proved out that Power BI provided the functionality to add great visuals to an application page that a non-developer analyst could manage. It also helped us understand our formatting options. You can get started with Power BI embedded without having to provision the embedded node in Azure, so it’s a no/low dollar commitment to give it a try.

If you have done a Power BI embedded project, please comment and let me know what you liked and didn’t like, or if there are any ideas to which I should add a vote.

Azure, Conferences, Microsoft Technologies, Personal

Please join me for my PASS Summit Pre-Con with Melissa Coates

I’m excited to announce that I’m joining forces with Melissa Coates (aka SQL Chick) to do a full-day PASS Summit Pre-Conference Session this year!

We’ll be talking about Designing Modern Data and Analytics Solutions in Azure.

Many traditional data warehousing professionals as well as other data engineers are taking on analytics projects in Azure. There are more (and ever-changing) options available in Azure that extend our capabilities beyond what we had on premises. And there are several different ways to create an analytics solution in Azure, to the point that it can be difficult or overwhelming to have to make those technology decisions up front.  We want to help you get started in Azure, provide design patterns and reference architectures, and share our lessons learned from solutions we have implemented. We’ll talk through technologies such as Azure SQL DB, Azure SQL DW, Azure Data Lake, Azure Data Factory, Azure Databricks, HDInsight, Analysis Services, Azure Machine Learning, Power BI, Virtual Machines, and more.

Approximately 30% of the day will be hands-on labs, 50% presentation, and 20% open discussion and questions.

Attendees of our session will gain a broad understanding of the fundamentals for designing data solutions in Azure, techniques for navigating the wide variety of platform choices in Azure, and suggestions for developing sound architectural systems.

I hope you’ll join us on Monday, November 5th.