Accessibility, Conferences, Microsoft Technologies

Captioning Options for Your Online Conference

Many conferences have moved online this year due to the pandemic, and many attendees are expecting captions on videos (both live and recorded) to help them understand the content. Captions can help people who are hard of hearing, but they also help people who are trying to watch presentations in noisy environments and those who lack good audio setups as they are watching sessions. Conferences arguably should have been providing live captions for the in-person events they previously held. But since captions are finally becoming a wider a topic of concern, I want to discuss how captions work and what to look for when choosing how to caption content for an online conference.

There was a lot of information that I wanted to share about captions, and I wanted it to be available in one place. If you don’t have the time or desire to read this post, there is a summary at the bottom.

Note: I’m not a professional accessibility specialist. I am a former conference organizer and current speaker who has spent many hours learning about accessibility and looking into options for captioning. I’m writing about captions here to share what I’ve learned with other conference organizers and speakers.

Closed Captions, Open Captions, and Subtitles

Closed captions provide the option to turn captions on or off while watching a video. They are usually shown at the bottom of the video. Here’s an example of one of my videos on YouTube with closed captions turned on.

YouTube video with closed captions turned on and the caption text shown along the bottom. The CC button on the bottom has a red line under it indicating it is on.
YouTube video with closed captions turned on. The CC button at the bottom has a red line under it to indicate the captions are on.

The placement of the captions may vary based upon the service used and the dimensions of the screen. For instance, if I play this video full screen on my wide screen monitor, the captions cover some of the content instead of being shown below.

Open captions are always displayed with the video – there is no option to turn them off. The experience with open captions is somewhat like watching a subtitled foreign film.

But despite captions often being referred to colloquially as subtitles, there is a difference between the two. Captions are made for those who are hard of hearing or have auditory processing issues. Captions should include any essential non-speech sound in the video as well as speaker differentiation if there are multiple speakers. Subtitles are made for viewers who can hear and just need the dialogue provided in text form.

For online conferences, I would say that closed captions are preferred, so viewers can choose whether or not to show the captions.

How Closed Captions Get Created

Captions can either be created as a sort of timed transcript that gets added to a pre-recorded video, or they can be done in real time. Live captioning is sometimes called communication access real-time translation (CART).

If you are captioning a pre-recorded video, the captions get created as a companion file to your video. There are several formats for caption files, but the most common I have seen are .SRT (SubRip Subtitle), .VTT (Web Video Text Tracks). These are known as simple closed caption formats because they are human readable – showing a timestamp or sequence number and the caption in plain text format with a blank line between each caption.

Who Does the Captions

There are multiple options for creating captions. The first thing to understand is that captioning is a valuable service and it costs money and/or time.

In general, there are 3 broad options for creating captions on pre-recorded video:

  • Authors or conference organizers manually create a caption file
  • Presentation software creates a caption file using AI
  • A third-party service creates a caption file with human transcription, AI, or a combination of both

Manually creating a caption file

Some video editing applications allow authors to create caption files. For example, Camtasia provides a way to manually add captions or to upload a transcript and sync it to your video.

Alternatively, there is a VTT Creator that lets you upload your video, write your captions with the video shown so you get the timing right, and then output your .VTT file.

Another approach is to use text-to-speech software to create a transcript of everything said during the presentation and then edit that transcript into a caption file.

Services like YouTube offer auto-captioning, so if it’s an option to upload as a private video to get the caption file from there, that is a good start. But you will need to go back through and edit the captions to ensure accuracy with either of these approaches. Vimeo also offers automatic captioning, but the results will also need to be reviewed and edited for accuracy.

These are valid approaches when you don’t have other options, but they can be very time consuming and the quality may vary. This might be ok for one short video, but is probably not ideal for a conference.

If you are going to make presenters responsible for their own captions, you need to provide them with plenty of time to create the captions and suggest low-cost ways to auto-generate captions. I’ve seen estimates that it can take up to 5 hours for an inexperienced person to create captions for one hour of content. Please be aware of the time commitment you are requesting of your presenters if you put this responsibility on them.

Captions in Your Presentation Software

Depending on the platform you use, your presentation software might provide AI-driven live captioning services. This is also known as Automatic Speech Recognition (ASR). For example, Teams offers a live caption service. As of today (November 2020), my understanding is that Zoom, GoToMeeting, and GoToWebinar do not offer built-in live caption services. Zoom allows you to let someone type captions or integrate with a 3rd party caption service. Zoom and GoToMeeting/GoToWebinar do offer transcriptions of meeting audio after the fact using an AI service.

PowerPoint also offers live captioning via its subtitles feature. My friend Echo made a video and blog post to show the effectiveness of PowerPoint subtitles, which you can view here. There are a couple of things to note before using this PowerPoint feature:

  1. It only works while PowerPoint is in presentation mode. If you have demos or need to refer to a document or website, you will lose captions when you open the document or web browser.
  2. If you are recording a session, your subtitles will be open subtitles embedded into your video. Viewers will not be able to turn them off.
  3. The captions will only capture the audio of the presenter who is running the PowerPoint. Other speakers will not have their voice recorded and will not be included in the captions.

Google Slides also offers live captions. The same limitations noted for PowerPoint apply to Google Slides as well.

Third-Party Caption Services

There are many companies that provide captioning services for both recorded and live sessions. This can be a good route to go to ensure consistency and quality. But all services are not created equal – quality will vary. For recorded sessions, you send them video files and they give you back caption files (.VTT, .SRT, or another caption file format). They generally charge you per minute of content. Some companies offer only AI-generated captions. Others offer AI- or human-generated captions, or AI-generated captions with human review. Humans transcribing your content tends to cost more than AI, but it also tends to have a higher accuracy. But I have seen some impressively accurate AI captions. Captions on recorded content are often less expensive than live captions (CART).

Below are a few companies I have come across that offer caption services. This is NOT an endorsement. I’m listing them so you can see examples of their offerings and pricing. Most of them offer volume discount or custom pricing.

  • Otter.ai – offers AI-generated captions for both recorded and live content, bulk import/export, team vocabulary
  • 3PlayMedia – offers AI-generated and human-reviewed captions for recorded content, AI-generated captions for live content. (Their standard pricing is hidden behind a form, but it’s currently $0.60 per minute of live auto-captioning and $2.50 per minute of closed captions for recorded video.)
  • Rev – offers captions for both recorded and live content, shared glossaries and speaker names to improve accuracy.

The Described and Captioned Media Program maintains a list of captioning service vendors for your reference. If you have used a caption service for a conference and want to share your opinion to help others, feel free to leave a comment on this post.

Questions for Conference Organizers to Ask When Choosing a Captioning Vendor

For recorded or live video:

  • What is your pricing model/cost? Do you offer bulk discounts or customized pricing?
  • Where/how will captions be shown in my conference platform? (If it will overlay video content, you need to notify speakers to adjust content to make room for it. But try to avoid this issue where possible.)
  • Is there an accuracy guarantee for the captions? How is accuracy measured?
  • Can I provide a list of names and a glossary of technical terms to help improve the caption accuracy?
  • Does the captioning service support multiple speakers? Does it label speakers’ dialogue to attribute it to the right person?
  • Does the captioning service conform to DCMP or WCAG captioning standards? (Helps ensure quality and usability)
  • How does the captioning service keep my files and information secure (platform security, NDAs, etc.)?
  • What languages does the captioning service support? (Important if your sessions are not all in English)

For recorded video:

  • Does my conference platform support closed captions? (If it doesn’t, then open captions encoded into the video will be required.)
  • What file type should captions be delivered in to be added to the conference platform?
  • What is the required lead time for the captioning service to deliver the caption files?
  • How do I get videos to the caption service?

For captions on live sessions:

  • Does the live caption service integrate with my conference/webinar platform?
  • How do I get support if something goes wrong? Is there an SLA?
  • What is the expected delay from the time a word is spoken to when it appears to viewers?

Further Captioning Advice for Conference Organizers

  • Budget constraints are real, especially if you are a small conference run by volunteers that doesn’t make a profit. Low quality captions can be distracting, but no captions means you have made a decision to exclude people who need captions. Do some research on pricing from various vendors, and ask what discounts are available. You can also consider offering a special sponsorship package where a sponsor can be noted as providing captions for the conference.
  • If you are running a large conference, this should be a line item in your budget. Good captions cost money, but that isn’t an excuse to go without them.
  • If your conference includes both live and recorded sessions, you can find a vendor that does both. You’ll just want to check prices to make sure they work for you.
  • If your budget means you have to go with ASR, make sure to allow time to review and edit closed captions on recorded video.
  • Try to get a sample of the captions from your selected vendor to ensure quality beforehand. If possible for recorded videos, allow speakers to preview the captions to ensure quality. Some of them won’t, but some will. And it’s likely a few errors will have slipped through that can be caught and corrected by the speakers or the organizer team. This is especially important for deeply technical or complex topics.
  • Make sure you have plenty of lead time for recorded videos. If a speaker is a few days late delivering a video, make sure their video can still be captioned and confirm if there is an extra fee.

Final Thoughts and Recap

If you’d like more information about captions, 3PlayMedia has an Ultimate Guide to Closed Captioning with tons of good info. Feel free to share any tips or tricks you have for captioning conference sessions in the comments.

I’ve summarized the info in this post below for quick reference.

Terms to Know

  • Closed captions: captions that can be turned on and off by the viewer
  • Open captions: captions that are embedded into the video and cannot be turned off
  • CART: communication access real-time translation, a technical term for live captioning
  • ASR: automatic speech recognition, use of artificial intelligence technology to generate captions
  • .SRT and .VTT: common closed caption file formats

Choosing a Captioning Solution for Your Conference

(Click to enlarge)

Diagram summarizing decision points when choosing a captioning solution. For high budget, choose human generated/reviewed captions from a service. For low budget and moderate time, choose ASR captions. For no budget, choose ASR built into presentation/conference software. Otherwise, someone will need to manually create captions. If you can't provide captions, let viewers know in advance.
This diagram represents general trends and common decision points when choosing a captioning solution. Your specific situation may vary from what is shown here

Summary of Caption Solutions

Manual creation of caption files for recorded sessions
Cost: None
Time/Effort: High
Pros:
• Doesn’t require a third-party integration
• Supports closed captions
• Works no matter what application is shown on the screen
• Works not matter what application is used to record and edit video
Cons:
• Accuracy will vary widely
• Manual syntax errors can cause the file to be unusable

Upload to YouTube, Vimeo or another service that offers free captions
Cost: None to Low
Time/Effort: Medium
Pros:
• Supports closed captions
• Works no matter what application is shown on the screen
• Works no matter what application is used to record and edit video
Cons:
• Not available for live sessions
• Requires editing of captions to achieve acceptable accuracy
• Requires an account with the service and (at least temporary) permission to upload the video
• Accuracy will vary widely

Auto-generated captions in presentation software (e.g., PowerPoint, Google Slides)
Cost: Low
Time/Effort: Low
Pros:
• Works for live and recorded sessions
• No third-party integrations required
Cons:
• Requires that all presenters use presentation software with this feature
• Must be enabled by the presenter
• Won’t work when speaker is showing another application
• Often offers only open captions
• Accuracy may vary
• Often only captures one speaker

ASR (AI-generated) captions from captioning service
Cost: Medium
Time/Effort: Low
Pros:
• Works for live and recorded sessions
• Supports closed captions
• Works no matter what application is shown on the screen
• Works not matter what application is used to record and edit video
Cons:
• Accuracy may vary
• Requires planning to meet lead times for recorded sessions
• Poor viewer experience if delay is too large during live sessions

Human-generated or human-reviewed captions from a captioning service
Cost: High
Time/Effort: Low
Pros:
• Ensures the highest quality with the lowest effort from conference organizers and speakers
• Works for live and recorded sessions
• Works no matter what application is shown on the screen
• Works not matter what application is used to record and edit video
Cons:
• Requires planning to meet lead times for recorded sessions
• Poor viewer experience if delay is too large during live sessions

I hope you find this exploration of options for captions in online conference content helpful. Let me know in the comments if you have anything to add to this post to help other conference organizers.

DAX, Microsoft Technologies, Power BI

DAX Logic and Blanks

A while back I was chatting with Shannon Lindsay on Twitter. She shares lots of useful Power BI tips there. She shared her syntax tip of the & operator being used for concatenation and the && operator being used for boolean AND, which reminded me about implicit conversions and blanks in DAX.

Before you read the below tweet, see how many of these you can guess correctly:

Blank + 5 = ? 
Blank * 5 = ?
5 / Blank = ?
0 / Blank = ?

In DAX, Blank is converted to 0 in addition and subtraction.

What about boolean logic? Do you know the result of the following expressions?

AND(True(), Blank()) = ? 
OR(True(), Blank()) = ? 
AND(False(), Blank()) = ? 
AND(Blank(), Blank()) = ? 

You can see the results as well as a few more permutations in the screenshot below.

Two tables in a Power BI report. The left table shows arithmetic operations involving blanks. For example, Blank + Blank = Blank, 0 * Blank = NaN, 5 * Blank = Blank, 5 / Blank = Infinity. The right table shows boolean operations involving blanks. True and blank = false, true or blank = true, false and blank = false, blank or blank = false
Read the left table as Number1 [operator] Number2, so 5 + Blank = 5. 5 * Blank = Blank. And 5 / Blank = Infinity. Read the right table as Bool1 [operator] Bool2, so True AND Blank = False and True OR Blank = True.

Why does this matter?

You need to understand the impact of blanks in your data. Do you really want to divide by zero when you are missing data? If you are performing a boolean AND, and your data is blank, are you ok with showing a result of False? Remember that your expression may produce undesired results rather than an error.

First, you need to be aware of where it is possible in your data to get a blank input. When you are writing your DAX measures, you may need to handle blanks. DAX offers the IFERROR() function to check if the result of an expression throws an error. There is also an ISBLANK() function that you can use to check for a blank value and a COALESCE() function to provide an alternate value when a blank value is detected.

But adding extra logic in your measures may have a performance impact. For example, the DIVIDE() function can handle divide by zero errors for you. But DIVIDE() may be slower than the / operator. The performance difference is dependent on your data and the expression you are writing. Alternatively, you can use an IF statement to check if an input value is greater than zero using the > operand. This can be quicker than checking for blanks or errors using other functions.

At the end of the day, producing the correct result is more important than fast performance, but we strive to achieve both. If you have any tips for handling blanks in DAX, please share them in the comments.

Accessibility, Data Visualization, Microsoft Technologies, Power BI

Stop Letting Accessibility Be Optional In Your Power BI Reports

We don’t talk about inclusive design nearly enough in the Power BI community. I was trying to recall the last time I saw a demo report (from Microsoft or the community) that looked like consideration was made for basic accessibility, and… it’s a pretty rare occurrence.

A woman, man, and another man in a wheelchair next to the Power BI logo.

Part of the reason for this might be that accessibility was added into Power BI after the fact, with keyboard accessible visual interactions being added in 2019 as one of the last big accessibility improvements. But I think the more likely reasons are that inclusive design requires empathy and understanding of how to build reports for people who work differently than ourselves, and Power BI accessibility features take time and effort to implement. While we can never make our reports 100% accessible for everyone, that doesn’t mean we should just not try for anyone.

Population statistics tell us that many of our colleagues have or will have a disability at some point, and many of them will be invisible. So even if you don’t see a report consumer with an obvious disability today, that doesn’t mean an existing user won’t acquire a disability or a new user with a disability won’t come along as people change roles in an organization. In addition to the permanent disabilities we normally think of, there are also temporary and situational disabilities that we should try to accommodate.

In order to start designing more inclusively, we need to increase conversation around accessibility requirements and standards for our reports. I fully understand that it can feel tedious or confusing as you get started. I hope that as Power BI matures, the accessibility features will mature as well to make it even easier to create a more accessible report by default. For now, the only way to make accessible Power BI report design easier for report creators is for us to start forming accessible design habits and to offer feedback to the Power BI team along the way.

My Accessible Report Design Proposal

This is what I would like to see from report creators in the community as well as within Microsoft. I’ll define what I mean by accessible report design in the next section.

  • Before publishing a report, implement accessible design techniques as thoroughly as possible.
  • For demonstrations of report design/UI techniques where you are providing a finished product at the end, implement accessible design techniques as thoroughly as possible.
  • For demonstrations of things that are not inherently visual, implement bare minimum accessibility or add a disclaimer to the report.
    Example: “Here’s a cool DAX technique that I just threw into a quick table or bar chart to show you the results. It hasn’t been cleaned up and made accessible (alt text, color contrast, etc.), but I would do that before publishing.”
  • For demonstrations of report design/UI techniques where you show only part of the process, implement bare minimum accessibility or add a disclaimer to the report. 
    Example: “This is the part of the report creation process about creating bookmarks, and before I publish to an audience, I want to make sure I’m following good design practices including accessibility.”

Power BI Report Accessibility

I have a full list of things to check here. That is the checklist that I use to ensure my report designs are generally accessible, when I have no specific compliance requirements or knowledge or any specific disabilities that need to be accommodated. In my opinion, this is what we should be doing in all of our reports because we want everyone in our intended audience to be able use our reports. You’ll find a very similar checklist on Microsoft Docs.

If you need to start smaller, you can go with my bare minimum accessibility and work your way up to the full list.

Bare Minimum Accessibility

This is the short list of the most impactful (according to me) accessibility changes you can make in your report. Use this because you have to start somewhere, but realize there is more we should be doing.

  1. Ensure text and visual components have sufficient color contrast
  2. Use descriptive, purposeful chart titles
  3. Avoid using color as the only means of conveying information
  4. Set tab order on all visuals in each page
  5. Remove unnecessary jargon and acronyms from all charts

Give It a Try

I just learned that the Power BI Community Featured Data Stories Gallery theme for September is Accessibility. So here’s your chance to win a free t-shirt and internet bragging rights by showing off your accessible design skills. You need to submit your report to the Data Stories Gallery by September 30th in order for your submission to be considered. But a well designed, accessible Power BI report added to the gallery is appreciated any time of year!

Accessibility, Data Visualization, Microsoft Technologies, Power BI

Fun with Power BI and Color Math

I recently published my color contrast report in the Power BI Data Stories Gallery. It allows you to enter two hex color values and then see the color contrast ratio and get advice on how the two colors should be used together in an accessible manner.

Screenshot of the Color Contrast calculator Power BI report. The report headline reads "How shoudl I use these colors together in my Power BI report?". There are 2 slicers that allow you to select colors by hex value. A contrast ratio is shown along with advice generated on how to use the colors.
Color contrast calculations in a Power BI report

I could go on for paragraphs about making sure your report designs are accessible and useful for your intended audience. But this post focuses on how I made this report.

The Calculations

Color contrast (as calculated in the WCAG 2.1 success criteria) is dependent on luminance. Luminance is the relative brightness of any point in a color space, normalized to 0 for darkest black and 1 for lightest white. In order to calculate color contrast you must first get the luminance of each color.

As an example, I have colors #F3F2F1 and #007E97. In this hex notation, often explained as #RRGGBB, the first two digits represent red, the second two digits are green, and the last two digits are blue. Each two digits is a value that represents the decimal numbers 0 to 255 in hexadecimal notation. The same red, green, and blue values can be represented in decimal notation as integers, and this is what is used to calculate luminance. #F3F2F1 is RGB(243, 242, 241), and #007E97 is RGB(0,126,151).

On a side note, there are places in Power BI where we can change the transparency of the color which is referred to as RGBA (where A represents opacity/transparency). But whenever you copy a hex color value out of the color palette in Power BI, you will just see the 6 digits without the A because the A is stored separately in the UI. When you set colors using DAX formulas, you can specify the A value.

The sRGB color space is non-linear. It compensates for humans’ non-linear perception of light and color. If images are not gamma-encoded, they assign too many bits or too much bandwidth to highlights that humans can’t distinguish, and too few bits to shadows to which humans are sensitive and would require more bits to maintain the same visual quality. To calculate luminance we have to linearize the color values.

For each color component (R,G,and B), we first divide our integer value by 255 to get a decimal value between 0 and 1. Then we apply the linearization formula:

  • if R sRGB <= 0.04045 then R = R sRGB /12.92 else R = ((R sRGB +0.055)/1.055) ^ 2.4
  • if G sRGB <= 0.04045 then G = G sRGB /12.92 else G = ((G sRGB +0.055)/1.055) ^ 2.4
  • if B sRGB <= 0.04045 then B = B sRGB /12.92 else B = ((B sRGB +0.055)/1.055) ^ 2.4

Note: You will find sources online that that incorrectly use the number 0.03928 in the linearization formula instead of .04045. My understanding is that this is incorrect for sRGB.

Then we plug those values in to calculate luminance:

L = 0.2126 * R + 0.7152 * G + 0.0722 * B

The luminance of #F3F2F1 is .8891. The luminance of #007E97 is .1716.

The final calculation is color contrast:

(L1 + 0.05) / (L2 + 0.05), where

  • L1 is the relative luminance of the lighter of the foreground or background colors, and
  • L2 is the relative luminance of the darker of the foreground or background colors.

The color contrast between #F3F2F1 and #007E97 is 4.24, and we usually write this as 4.24:1. You can check my math here.

The Dataset

The source data for the report is generated entirely in Power Query. It starts with a simple list of the integers 0 through 255. I placed this in a query called Values.

let
    Source = List.Numbers(0,256),
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", Int64.Type}})
in
    #"Changed Type"

My linearization function is called ColorConvert.

(colornum as number) =>
let 
    Source = if colornum < .04045 then colornum/12.92 else  Number.Power(((colornum+0.055)/1.055),2.4)
in
    Source

My main query is called color 1. This is where all the calculations through luminance are done.

let
    //Get values 0 - 255
    Source = Values,
    //Call that column R for Red
    #"R Dec" = Table.RenameColumns(Source,{{"Column1", "R Dec"}}),
    //Crossjoin to Values to get Green values 0 - 255
    #"G Dec" = Table.AddColumn(#"R Dec", "Custom", each Values),
    #"Expanded G Dec" = Table.ExpandTableColumn(#"G Dec", "Custom", {"Column1"}, {"G Dec"}),
    //Crossjoin to Values to get Blue values 0 - 255
    #"B Dec" = Table.AddColumn(#"Expanded G Dec", "B", each Values),
    #"Expanded B Dec" = Table.ExpandTableColumn(#"B Dec", "B", {"Column1"}, {"B Dec"}),
    //Get hexidecimal values for R,G,B
    #"R Hex" = Table.AddColumn(#"Expanded B Dec", "R Hex", each Text.End("00" & Number.ToText([R Dec], "x"),2)),
    #"G Hex" = Table.AddColumn(#"R Hex", "G Hex", each Text.End("00" & Number.ToText([G Dec], "x"),2)),
    #"B Hex" = Table.AddColumn(#"G Hex", "B Hex", each Text.End("00" & Number.ToText([B Dec], "x"),2)),
    //Concatenate to get full 6-digit Hex color value
    #"Changed Hex Type" = Table.TransformColumnTypes(#"B Hex",{{"R Hex", type text}, {"G Hex", type text}, {"B Hex", type text}}),
    #"Full Hex" = Table.AddColumn(#"Changed Hex Type", "Hex", each [R Hex] & [G Hex] & [B Hex]),
    //Convert integers to decimals and linearize
    #"R Lin" = Table.AddColumn(#"Full Hex", "R Lin", each ColorConvert(([R Dec]/255))),
    #"G Lin" = Table.AddColumn(#"R Lin", "G Lin", each ColorConvert(([G Dec]/255))),
    #"B Lin" = Table.AddColumn(#"G Lin", "B Lin", each ColorConvert(([B Dec]/255))),
    //Calculate luminance with the linearized values
    #"Luminance" = Table.AddColumn(#"B Lin", "Luminance", each 0.2126 * [R Lin] + 0.7152 * [G Lin] + 0.0722 * [B Lin]),
    #"Changed Luminance Type" = Table.TransformColumnTypes(#"Luminance",{{"Luminance", type number}}),
    //Create a column for hexidecimal value with the hash/pound at the beginning
    #"Hex Dup" = Table.DuplicateColumn(#"Changed Luminance Type", "Hex", "Hex With Hash"),
    #"Hex with Hash" = Table.TransformColumns(#"Hex Dup", {{"Hex With Hash", each "#" & _, type text}}),
    //Remove Hex and linearized RGB columns to keep model under 1 GB limit for Pro license
    #"Removed Columns" = Table.RemoveColumns(#"Hex with Hash",{"R Hex", "G Hex", "B Hex", "R Lin", "G Lin", "B Lin", "Hex"}),
    //Rename Hex with Hash to Hex
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Hex With Hash", "Hex"}})
in
    #"Renamed Columns"

In order to allow users to choose two colors, I made a reference query to Color 1 called Color 2.

let
    Source = #"Color 1"
in
    Source

If you are interested in these Power Query scripts, you can get them from this Gist.

DAX Calculations

The color contrast calculation is a DAX measure because it is dynamically calculated based upon the colors selected in the report.

Color Contrast = 
If( Max('Color 1'[Luminance]) > MAX('Color 2'[Luminance]),
    Divide((Max('Color 1'[Luminance]) + 0.05) , (Max('Color 2'[Luminance]) + 0.05)),
    Divide((Max('Color 2'[Luminance]) + 0.05) , (Max('Color 1'[Luminance]) + 0.05))
)

The advice given based upon the color contrast ratio is also a DAX measure.

Advice =
IF (
    [Color Contrast] < 3,
    "Not enough contrast for text or non-text content, use only for decorative items",
    IF (
        [Color Contrast] < 4.5,
        "Appropriate for large text at least 18pt, bold text at least 14 pt, or non-text content",
        IF (
            'Color 1'[Color Contrast] >= 4.5,
            "Appropriate for any size text and any non-text content"
        )
    )
)

The example charts showing the two colors as foreground and background are SVG measures.

Chart 1 =
VAR Bkgrnd =
    MAX ( 'Color 1'[Hex] )
VAR Frgrnd =
    MAX ( 'Color 2'[Hex] )
VAR chart = "data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg' width='100' height='100' viewBox='0 0 24 24' style='background-color:" & Bkgrnd & "'><path fill= '" & Frgrnd & "' d='M7 19h-6v-11h6v11zm8-18h-6v18h6v-18zm8 11h-6v7h6v-7zm1 9h-24v2h24v-2z'/></svg>"
RETURN
    chart
Chart 2 =
VAR Bkgrnd =
    MAX ( 'Color 2'[Hex] )
VAR Frgrnd =
    MAX ( 'Color 1'[Hex] )
VAR chart = "data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg' width='100' height='100' viewBox='0 0 24 24' style='background-color:" & Bkgrnd & "'><path fill= '" & Frgrnd & "' d='M7 19h-6v-11h6v11zm8-18h-6v18h6v-18zm8 11h-6v7h6v-7zm1 9h-24v2h24v-2z'/></svg>"
RETURN
    chart

The check or x mark to indicate whether the colors can be used together in a graph or in text is created using Unicode characters.

UseInGraph =
IF ( [Color Contrast] < 3, "✗", "✔" )
UseInText =
IF ( [Color Contrast] < 4.5, "✗", "✔" )

The RGB value shown for each color in the report is a DAX measure because storing it in the model made the model size larger than 1 GB, which would have prohibited me from deploying the report and publishing it to the web.

RGB1 =
VAR R =
    SELECTEDVALUE ( 'Color 1'[R Dec] )
VAR G =
    SELECTEDVALUE ( 'Color 1'[G Dec] )
VAR B =
    SELECTEDVALUE ( 'Color 1'[B Dec] )
RETURN
    R & "," & G & "," & B

Check Out the Report

This post was an enjoyable combination of color, Power BI, and a bit of math. It was fun to make the report since it brought together my interests in accessibility and Power BI model optimization. At the least I’m hoping this gives you some exposure to how accessibility guidelines are applied to reports. If you are like me, you’ll find the color math fascinating and go down that rabbit hole.

Take a few seconds, pick some colors, and give the report a try.

Accessibility, Conferences, Microsoft Technologies, PASS Summit, Power BI

I’m Speaking at Virtual PASS Summit 2020

PASS Summit has gone virtual this year, but that isn’t keeping PASS from delivering a good lineup of speakers and activities. I’m excited to be presenting a pre-con and two regular sessions this year. I know virtual delivery changes the interaction between audience and speaker, and I’m going to do everything I can to make my sessions more than just standard lecture and demo to keep things interesting.

Building Power BI Reports that Communicate Insights and Engage People (Pre-Con)

If you are into Power BI or data visualization, check out my pre-con session. It’s called Building Power BI Reports that Communicate Insights and Engage People. Unless we’ve had data visualization training, the way we learn to make reports is by copying reports that others have made. But that assumes other people were designing intentionally for human consumption. Another issue is that we often mimic example reports from tool vendors. That can be very helpful with the technical aspects of getting content on the page, but we often overlook the design aspects of reports that can make or break their usability and effectiveness in communicating information. My pre-con will begin with discussion on how humans interpret data visualizations and how you can use that to your advantage to make better, more consumable visualizations. We’ll take those lessons and apply them specifically to Power BI and then add on some tips and tricks. Throughout the day, there will be hands-on exercises and opportunities for group conversation. And you’ll receive some resources to take with you to help you continue to improve your report designs.

Agenda slide from my pre-con session: 1) Defining Success, 2) Message & Story, 3) Designing a Visual, 4) Refine Your Report 5) Applied Power BI 6) Power BI Tricks 7) Wrap-Up
Agenda for my PASS Summit pre-con titled Building Power BI Reports that Communicate Insights and Engage People

This session is geared toward people that have at least basic familiarity with Power BI Desktop (if you can populate a bar chart on a report page, that’s good enough). If you have never opened Power BI Desktop, we might move a little fast, but you are welcome to join us and give it a try. If you are pretty good with Power BI Desktop, but you want to improve your data visualization skills, this session could also be a good fit for you. I hope you’ll register and join my pre-con.

Implementing Data-Driven Storytelling Techniques in Power BI

Data storytelling is a popular concept, but the techniques to implement storytelling in Power BI can be a bit elusive, especially when you have data values that change as the data is refreshed. In this session, we’ll talk about what is meant by story. Then I’ll introduce you to tool-agnostic techniques for data storytelling and show you how you can use them in Power BI. We’ll also discuss the visual hierarchy within a page and how that affects your story. You can view my session description here.

Inclusive Presentation Design

I’m also delivering a professional development session for those of us that give presentations. Most speakers have good intentions and are excited to share their knowledge and perspective, but we often exclude audience members with our presentation design. Join me in this session to discuss how to design your presentation materials with appropriate content formatted to maximize learning for your whole audience. You’ll gain a better understanding of how to enhance your delivery to make an impact on those with varying abilities to see, hear, and understand your presentation. You can view my presentation description here.

Other Pre-Cons from My Brilliant Co-Workers

If you aren’t into report design, my DCAC coworkers are delivering pre-cons that may interest you.

Denny Cherry is doing a pre-con session on Microsoft Azure Platform Infrastructure.

John Morehouse is talking about Avoiding the Storms When Migrating to Azure.

I hope you’ll join one of us for a pre-con as well as our regular sessions. With PASS Summit being virtual, the lower price and removal of travel requirements may make this conference more accessible to some who haven’t been able to attend in past years. Be sure to get yourself registered and spread the word to colleagues.

Azure, Azure Data Factory, Microsoft Technologies, Power BI

Refreshing a Power BI Dataset in Azure Data Factory

I recently needed to ensure that a Power BI imported dataset would be refreshed after populating data in my data mart. I was already using Azure Data Factory to populate the data mart, so the most efficient thing to do was to call a pipeline at the end of my data load process to refresh the Power BI dataset.

Power BI offers REST APIs to programmatically refresh your data. For Data Factory to use them, you need to register an app (service principal) in AAD and give it the appropriate permissions in Power BI and to an Azure key vault.

I’m not the first to tackle this subject. Dave Ruijter has a great blog post with code and a step-by-step explanation of how to use Data Factory to refresh a Power BI dataset. I started with his code and added onto it. Before I jump into explaining my additions, let’s walk through the initial activities in the pipeline.

ADF pipeline that uses web activities to gets secrets from AKV, get an AAD auth token, and call the Power BI API to refresh a dataset. Then and Until activity and an If activity are executed.
Refresh Power BI Dataset Pipeline in Data Factory

Before you can use this pipeline, you must have:

  • an app registration in Azure AD with a secret
  • a key vault that contains the Tenant ID, Client ID of your app registration, and the secret from your app registration as separate secrets.
  • granted the data factory managed identity access to the keys in the key vault
  • allowed service principals to use the Power BI REST APIs in in the Power BI tenant settings
  • granted the service principal admin access to the workspace containing your dataset

For more information on these setup steps, read Dave’s post.

The pipeline contains several parameters that need to be populated for execution.

ADF pipeline parameters

The first seven parameters are related to the key vault. The last two are related to Power BI. You need to provide the name and version of each of the three secrets in the key vault. The KeyVaultDNSName should be https://mykeyvaultname.vault.azure.net/ (replace mykeyvaultname with the actual name of your key vault). You can get your Power BI workspace ID and dataset ID from the url when you navigate to your dataset settings.

The “Get TenantId from AKV” activity retrieves the tenant ID from the key vault. The “Get ClientId from AKV” retrieves the Client ID from the key vault. The “Get Secret from AKV” activity retrieves the app registration secret from the key vault. Once all three of these activities have completed, Data Factory executes the “Get AAD Token” activity, which retrieves an auth token so we can make a call to the Power BI API.

One thing to note is that this pipeline relies on a specified version of each key vault secret. If you always want to use the current version, you can delete the SecretVersion_TenantID, SecretVersion_SPClientID, and SecretVersion_SPSecret parameters. Then change the expression used in the URL property in each of the three web activities .

For example, the URL to get the tenant ID is currently:

@concat(pipeline().parameters.KeyVaultDNSName,'secrets/',pipeline().parameters.SecretName_TenantId,'/',pipeline().parameters.SecretVersion_TenantId,'?api-version=7.0')

To always refer to the current version, remove the slash and the reference to the SecretVersion_TenantID parameter so it looks like this:

@concat(pipeline().parameters.KeyVaultDNSName,'secrets/',pipeline().parameters.SecretName_TenantId,'?api-version=7.0')

The “Call Dataset Refresh” activity is where we make the call to the Power BI API. It is doing a POST to https://api.powerbi.com/v1.0/myorg/groups/{groupId}/datasets/{datasetId}/refreshes and passes the previously obtained auth token in the header.

This is where the original pipeline ends and my additions begin.

Getting the Refresh Status

When you call the Power BI API to execute the data refresh, it is an asynchronous call. This means that the ADF activity will show success if the call is made successfully rather than waiting for the refresh to complete successfully.

We have to add a polling pattern to periodically check on the status of the refresh until it is complete.

We start with an until activity. In the settings of the until loop, we set the expression so that the loop executes until the RefreshStatus variable is not equal to “Unknown”. (I added the RefreshStatus variable in my version of the pipeline with a default value of “Unknown”.) When a dataset is refreshing, “Unknown” is the status returned until it completes or fails.

ADF Until activity settings

Inside of the “Until Refresh Complete” activity are three inner activities.

ADF Until activity contents

The “Wait1” activity gives the dataset refresh a chance to execute before we check the status. I have it configured to 30 seconds, but you can change that to suit your needs. Next we get the status of the refresh.

This web activity does a GET to the same url we used to start the dataset refresh, but it adds a parameter on the end.

https://docs.microsoft.com/en-us/resGET https://api.powerbi.com/v1.0/myorg/groups/{groupId}/datasets/{datasetId}/refreshes?$top={$top}

The API doesn’t accept a request ID for the newly initiated refresh, so we get the last initiated refresh by setting top equal to 1 and assume that is the refresh for which we want the status.

The API provides a JSON response containing an array called value with a property called status.

In the “Set RefreshStatus” activity, we retrieve the status value from the previous activity and set the value of the RefreshStatus variable to that value.

Setting the value of the RefreshStatus variable in the ADF pipeline

We want the status value in the first object in the value array.

The until activity then checks the value of the RefreshStatus variable. If your dataset refresh is complete, it will have a status of “Completed”. If it failed, the status returned will be “Failed”.

The If activity checks the refresh status.

If activity expression in the ADF pipeline

If the refresh status is “Completed”, the pipeline execution is finished. If the pipeline activity isn’t “Completed”, then we can assume the refresh has failed. If the dataset refresh fails, we want the pipeline to fail.

There isn’t a built-in way to cause the pipeline to fail so we use a web activity to throw a bad request.

We do a POST to an invalid URL. This causes the activity to fail, which then causes the pipeline to fail.

Since this pipeline has no dependencies on datasets or linked services, you can just grab my code from GitHub and use it in your data factory.

Data Visualization, Microsoft Technologies, Power BI

Power BI Data Viz Makeover: From Drab to Fab

On July 11 at 3pm MDT, Rob Farley and I will be hosting a webinar on report design in Power BI. We will take a report that does not deliver insights, discuss what we think is missing from the report and how we would change it, and then share some tips from our report redesign.

Rob and I approach data visualization a bit differently, but we share a common goal of producing reports that are clear, usable, and useful. It’s easy to get caught up building shiny, useless things that show off tech at the expense of information. We want to give you real examples of how to improve your reports to provide the right information as well as a good user experience.

We’ll reserve some time to answer your questions and comments at the end. Come chat Power BI data viz with us.

You can register for the webinar at https://www.powerbidays.com/virtualevent/colorado-power-bi-days-2020-07-11/.

Come for the data viz tips, stay for the witty banter!

Data Visualization, Microsoft Technologies, Power BI

Data Visualization, Context, and Domain Expertise

I recently posted a graph to twitter and asked people to explain it.

Let’s look at the graph.

Bar chart showing low levels of steps in April until April 25th, when they increase about 3x and remain at that level through May.
Chart from Fitbit showing my step count from April 1 through May 23.

The graph is from Fitbit. It shows the number of steps I took each day between April 1 and May 23. We can see that I had a very low number of daily steps between April 1 and April 24. Then there is a spike where my steps almost quadruple on April 25. They decrease a bit for a couple of days while remaining well above the previous average. Then my steps increase again, staying up around 10,000 steps.

The Responses

The responses I received to my tweet largely fell into 3 categories:

  1. Complaints about the x-axis label
  2. Simple interpretations of the graph saying that the steps increased on April 25 and remained higher, often accompanied by statements that there isn’t enough data to explain why that happened.
  3. Guesses as to why the steps increased and then remained higher.

The X-Axis Label

Many of my twitter friends create data visualizations for fun and profit. It didn’t surprise me that they weren’t happy with the x-axis.

There are multiple x-axis labels that show the month and year, but the bars are at the day level. It’s unusual to see the Apr ’20 label repeated 4 times as we see in this graph. It’s not necessarily inaccurate, but its imprecision goes against convention.

The fact that multiple people commented on it demonstrates to me that it is more distracting than helpful. The way you format your data visualizations can be distracting. This is why I tweet and talk about bad charts and how to improve them for human consumption.

Literal Interpretation

Some people were only comfortable sticking with the information available in the chart. They acknowledged that the steps went up. Some said there were too many possible explanations to narrow it down to a certain reason why.

Speculative Explanations

I enjoyed the many guesses as to why my steps increased:

  • I suddenly got motivated to exercise more
  • I moved my office further from my bedroom
  • I’m building a really big staircase
  • The device used to track my steps changed
  • I started playing Just Dance every day
  • Covid-19 lockdown ended

A few people who know me (or at least pay attention to my twitter feed) had some insight.

I did get a new dog during the timeframe, but I got her on April 28th, not April 25th.

Also, the weather did warm up about 12 degrees Fahrenheit over the timeframe.

The Necessary Context

For the curious, here’s the real explanation.

I lost my dog Buster on April 4. He was with me for over 9 years, and he was my best friend. He was suddenly not feeling well at the end of March, and he was diagnosed with cancer. He declined rapidly, and I stayed with him on the living room floor until it was time to say goodbye. During those first 4 days of April, I really only left the living room to take Buster outside. I slept a lot that weekend and didn’t move much because I was sad.

With losing Buster, everything associated with Covid-19, and some other personal issues, I was depressed for the next few weeks. But I was also very busy with work. I had no energy to do anything else after work. And there wasn’t much to do since my city and state were on lockdown for Covid-19.

On April 25, I decided that the only way to get out of the emotional hole I was in was to get up and do something, so I walked a few miles around a nearby park. I came home and looked on PetFinder.com to see if there was a dog I’d like to adopt, and I came across a bulldog mix at Foothills Animal Shelter. I hadn’t cleaned my house since Buster died (see: depression). So I spent the rest of the weekend cleaning and dog-proofing just in case I brought the dog home.

On April 28, I adopted Izzy, an Olde English Bulldogge.

Izzy likes to walk. We walk between 2 and 4 miles each day. She is most of the reason the step count remained high throughout May.

Nice Dog. So What?

I hope what you’ll take away from this story is that to really deliver insights, you need to know the subject of your data visualizations. You need domain expertise. And it helps to have your own observations or other datasets to support the events you are visualizing.

If you don’t know me, any of the speculations could be the right answer. And the most you could do with my Fitbit data is to provide descriptive analysis, simply saying what happened without going into why. Many people who follow me on Twitter knew I recently got a dog. That explains the increase in step count from May 28 going forward. But it doesn’t address May 25th. Without the additional context of my step count in other months, you don’t know what my average step count is outside of this view. You wouldn’t know if my average count is normally closer to 3,000 or 10,000 because you don’t have that data. This is a perfect example of where you would need more data, more months of this data as well as additional datasets, to understand what is really going on. Sometimes there are actual datasets we can acquire, like weather data or Covid-19 lockdown dates. But there is no dataset for me losing Buster or struggling with depression.

This is part of why I prefer the term “data-informed decisions” over “data-driven decisions”. We often don’t have all the data to really understand what is going on. Technology has improved (see: Power BI) to make it quicker and easier to mash up data to provide a more complete picture. But we’ll still have to make decisions based upon incomplete data. If we have domain expertise, we may need to review data and ask questions to get better insights, and then rely on our knowledge and experiences to complete the picture.

This chart is also a good representation of a common issue in business intelligence: we often settle for only descriptive analytics. It may even have been a struggle just to get there. Let’s say I’m trying to become more active and using step count as a metric. You see this chart and see the increase in steps and say “That’s great. Do whatever you did last month to increase your steps even more.” Am I supposed to get another dog? Get less depressed?

Let’s pretend that my chart is not about my step count but is an operational report for an organization. That one chart tells you a trend of a single measure. We need more data, more visuals for this information to be impactful. The additional data adds necessary context. If this were a Power BI report, we might use interactivity to provide navigation paths to explore common questions about the data and to help identify what is influencing the current trend. Then you could use the report to facilitate a more productive conversation. I’m not addressing AI here, but after understanding the data and decisions made from it, you might introduce some machine learning to automate the analysis process.

Just having a report on something is not enough. The goal of data visualization is not to show off your data (if your service/product is data, that’s a different thing). It’s to help provide meaningful information to people so they can make decisions and take action. In order to do that, we need to understand our audience, the domain in which they are operating, and the questions they are trying to answer. Data visualization tools make it easy to get things on the page, but I hope you are designing your visualizations purposefully to facilitate data-informed decisions.

Microsoft Technologies, Power BI

An Updated Version of the Power BI Enterprise Deployment Whitepaper is Available

A new version of the Microsoft whitepaper “Planning a Power BI Enterprise Deployment” is now available. Once again, Melissa Coates (b|t) and Chris Webb (b|t) are the authors. I was lucky enough to be the tech editor again on this version, so I’m excited to see the new information be released to the public.

There were quite a few updates this time. Here are some of the highlights:

  • Section 3, “Power BI Architectural Choices”, has updated information on dataflows and Power BI Premium. It also includes a nice section clarifying the options available for embedding Power BI content.
  • Section 4, “Power BI Licensing and User Management”, has been updating to include information on self-service purchasing.
  • Section 5, “Power BI Source Data Considerations” now includes information on dataflows.
  • Section 6, “Power BI Dataset Storage Options” now contains information about Automatic Page Refresh and large models.
  • Section 7, “Power BI Data Refresh and Data Gateway” now mentions the Power Platform Admin Center. It also discusses dataflow refreshes in addition to dataset refreshes. And more information has been added regarding the use of gateway clusters for load balancing and high availability.
  • Section 8, “Power BI Dataset and Report Development Considerations” contains new information on shared datasets and .pbids (Power BI Data Source) files. It also has a new section providing guidance on information design and accessibility. And it provides updated information on the use of custom visuals.
  • Section 9, “Power BI Collaboration, Sharing and Distribution”, has been updated to reflect the new workspace experience. It also discusses shared and certified datasets and the new deployment pipelines. It also contains a nice decision tree to help you determine whether to use apps, workspaces, or sharing.
  • Section 10, “Power BI Administration”, has new recommendations for tenant settings. It also discusses protection metrics, custom help menus, custom branding as well as providing new information on managing workspaces and dataflows. And it discusses the new activity log and related PowerShell modules.
  • Section 11, “Power BI Security and Data Protection”, now discusses the roles in the new workspace experience as well as sensitivity labels and Microsoft Information Protection.
  • An updated list of deprecated items can be found in section 12, “Power BI Deprecated Items”.
  • Section 13, “Support, Learning, and Third-Party Tools” contains a great list of helpful resources for the Power BI practitioner.

I hope you’ll take a glance through the updated whitepaper and catch up on all the new information. Happy reading!

Conferences, Microsoft Technologies, Power BI

Power Up: Exploring the Power BI Ecosystem, May 27-28

Next week I’m speaking at at the Dynamic Communities Power Up event titled “Exploring the Power BI Ecosystem“. It takes place on May 27 & 28, 2020. This exciting 2-Day virtual event is designed to ensure attendees have a complete view of the Power BI product and surrounding ecosystem, provide expanded knowledge of the core components and showcase the possibilities for continued exploration and innovation.

Sessions during the event are 2.5 hours long, to really give you time to get into a topic. There are healthy 45-minute breaks between sessions to give you time to attend to personal matters. And the sessions are recorded to give you a chance to catch anything you miss. Some sessions, including mine, offer a take-home exercise to help solidify concepts discussed during the session.

I’m presenting Data Visualization and Storytelling on May 28 at 9am EST/1pm UTC. In this session, you will learn how to build eye-catching Power BI reports to support decision making. You will also see the importance and a realistic approach to data storytelling.

The following topics will be showcased through practical examples:

  • Creating beautiful reports: prioritizing your KPIs, playing with colors, grid
  • Choosing the best chart to illustrate your point
  • Introduction to the concept of Data Storytelling
  • Implementing quality checks on your report design
  • Implementing navigation in your report: bookmarks, drill-through, page-report tooltips, interactive Q&A

This training is a paid event, but it’s just $399 for the full 2 days. This training is great if you are a beginner-to-intermediate Power BI user trying to round out your skills across the many areas of the Power BI suite. You can head over to the website to register. I hope to see you there!