Consulting, Data Visualization, Microsoft Technologies, Power BI

Stress Cases and Data Visualization

Times are stressful right now. There is an ongoing pandemic affecting people’s health and livelihoods. Schedules are messed up, kids are home. People who aren’t used to working remotely are fumbling through learning how to work from home. And then there are the normal stresses that aren’t taking a break just because there is a pandemic: arguments with family members, home repairs, student loan debt.

Woman with her head in her hands
Life is stressful right now. It’s ok to not be ok.

I recently read the book Design for Real Life by Eric Meyer and Sara Wachter-Boettcher. I discovered it because I read another book by Sara Wachter-Boettcher, Technically Wrong: Sexist Apps, Biased Algorithms, and Other Threats of Toxic Tech that encourages us to look at the technology around us and consider how it can and has negatively affected people. If you work in analytics or web design, I recommend both books. She approaches design more from the context of user interfaces and web apps, but there is a lot of applicability to data visualization.

What’s a Stress Case?

Design for Real Life encourages us to consider stress cases. The idea behind stress cases is to consider our users as full, complex, sometimes distracted, vulnerable, and stressed-out people. They aren’t just happy personas who always click the right buttons and share all of our knowledge and assumptions.

Certain use cases, often labeled as edge cases, often go unidentified or dismissed as too infrequent. User interface designers, including data visualization designers, need to train ourselves to seek out those stress cases and understand how our design choices affect people in those usage scenarios. This helps us to minimize harm done by our design. But also, as stated in the book,

When we make things for people at their worst, they’ll work that much better when people are at their best.

Design for Real Life

There is a toxic trend in tech that we developers make things for ourselves, valuing the developers and designers over the users. Designing for stress cases helps us change that trend. We need to value our users’ time, understand our biases as designers, and make features that match our users’ priorities. A very applicable part of this for data visualization is to consider all the contexts that usage scenarios might happen.

Stress Cases in Data Visualization

Whether you are designing a visualization to be embedded in an app, included in an online news article, or published in a corporate dashboard, you can design for stress cases.

Let’s think about a corporate HR report built in Power BI that shows a prediction of employee retention. HR and team managers may use this information to help them assign projects or give promotions and raises. This dashboard and the conversation around it may address information related to identity (race, gender, sexual orientation). The use of the dashboard may affect someone’s compensation and job satisfaction.

How can we design for stress cases here?

We should be asking if we have the right (appropriate and validated) data to accomplish our goals, not just using whatever we can get our hands on. We definitely need to consider laws against discrimination that might be applicable to how we use demographic data. We need to consider the cost or harm done if we allow users to see these predictions down to the individual employee rather than an aggregation. And we should always be asking what actions people are taking as a result of using our dashboard. Have we considered any unintended consequences?

We want to use plain, easy to understand language. We want to explain our data sources and (at least high-level) how the predictive algorithm works. And we need to explain how we intend for this dashboard to be used. Basically, we want to be as transparent and easy to understand as possible. This dashboard can affect people’s livelihoods and happiness as well as the operational and financial health of the business. These data points are more than just pretty dots – they are people.

We also want to make sure our dashboard is easy to use. Think about digital affordances. We want it to be clear what happens when someone interacts with it. We can get fancy with bookmarks and editing interactions in a Power BI report. But does our intended user understand what is shown when they click a button that loads a bookmark? Can the intended user easily get what they need? Or do they have to jump through hoops to get to the right page and set the right filters?

Let’s say Sarah is a new manager at an organization that is trying to improve a high attrition rate, and she needs this information for a meeting with her boss and peers. She’s sitting in an online meeting using a 3-year-old tablet with 10 minutes of remaining battery life. Her two-year-old child is running around in the next the room. The group is discussing whether they should let some people go and then try to rebuild their teams. Can Sarah easily pull up the dashboard on her tablet and set the correct filters to see attrition numbers and retention factors for her team while trying to put on a brave face in front of her colleagues as she races against the tablet’s remaining battery time?

Visualizing the Coronavirus

Currently, it seems everyone in the data community wants to visualize the spread of COVID-19. Just look at the Power BI Data Stories Gallery. I get the appeal of using new and relevant data for practice, but consider your audience and the consequences of making it public. Do you trust and understand your data? Have you considered how the chart types and colors you choose can misrepresent the data? Who is it helping for you to publish your visualization out into the world? What message is your visualization sending? Is it unintentionally adding stress for your already stressed-out twitter followers? Most of us are not working with city officials or healthcare practitioners. We are visualizing this “for fun”. People are showing off Power BI/Tableau/D3 skills and not necessarily communicating clearly with purpose.

Your twitter friend/follower Frank is really concerned about the pandemic. His daughter lost her job at a restaurant. He’s worried about lining up projects for the next few months. And his wife is immunocompromised and in a higher risk group for getting COVID-19. He feels scared and helpless.

Your friend Dave is worried about his mother who lives alone a few hours away from him. He is bombarded by messages every day that say older people are more at risk of getting COVID-19. His mom says she is fine, but he wants her to come and stay with him. It’s all he’s thought about all day.

If you know nothing about epidemiology and don’t understand the response from various national, state, and local governments in your data, and you publish your viz with a choropleth (filled) map covered in shades of red, how will that affect Frank and Dave and everyone else who reads it? Did you add value to the COVID-19 conversation, or just increase confusion and fear?

Amanda Makulec published Ten Considerations Before You Create Another Chart About COVID-19 with some good advice, including the following.

To sum it up — #vizresponsibly; which may mean not publishing your visualizations in the public domain at all.

Amanda Makulec

Stress cases can be related to a crisis, mundane technology failures, or just situations that are stressful in the context of the user’s life. If you are visualizing data, remember there are people who may be interacting with our visuals in less than ideal circumstances. We need to design for them as much as for our ideal use cases.

If you have real-life examples of designing for stress cases in data visualization, please share them in the comments or tweet me at @mmarie.

Accessibility, Azure, Conferences, Microsoft Technologies, SQL Saturday

I Presented with Live Captioning and Sign Language Interpreters

I had the pleasure of presenting a full-day pre-conference session on the Friday before SQLSaturday Austin-BI last weekend. I could spend paragraphs telling you how enjoyable and friendly and inclusive the event was. But I’d like to focus on one really cool aspect of my speaking experience: I had both live captioning and sign language interpreters in my pre-con session.

First, let’s talk about the captions. While PowerPoint does have live captions/subtitles, that only works when you are using PowerPoint. When you show a demo or go to a web page, taking PowerPoint off the screen, you lose that ability. So we had a special setup provided by Shawn Weisfeld (Twitter|GitHub).

How the Live Captions Worked

A presenter uses a lavalier mic that sends audio to Epiphan Pearl. The presenter's computer sends video to Epiphan Pearl. Epiphan Pearl sends audio to a computer that sends audio to Azure and receives captions. The computer overlays the captions above the images from teh presenters laptop. That is all sent to the projector.
Technology setup at SQLSaturday Austin- BI Edition 2020 that provided live captions

The presenter connects their laptop to the Epiphan Pearl with an HDMI cable so they can send the video (picture) from the laptop. The speaker wears a lavalier microphone, which sends audio to the Pearl. The transcription green screen computer takes audio from the Pearl, sends it to Azure to be transcribed using Cognitive Services, and overlays the returned transcription text on a green screen input that is sent back to the Pearl. The projector gets the combined output of the transcription text and the presenter’s computer video output.

You can see an example of what it looked like from my presentation on Saturday in the tweet below. There are lots more pictures of it on Twitter with the #SqlSatAustinBI hashtag.

While this setup requires a bit more hardware, it worked so well! It took about 10 minutes to get it set up in the morning. As the speaker, I didn’t have to do anything but wear a mic. It transcribed everything I said regardless of what program my laptop was showing. There was very little lag. It seemed to be less than one second between when I would say something and when we would see it on the screen. While I try to speak clearly and slowly, sometimes I slip and fall back into speaking quickly. But the transcription kept up well. Some attendees said it was great to have the captions up on the screen to help them understand what I said when I occasionally spoke too quickly. The captions are placed at the top of the screen, above the image coming from my laptop, so I didn’t have to adjust my slides or anything to allow space for the captions.

The live captions were a big success. They helped not only people who had trouble hearing, but also those who spoke English as a second language and those who weren’t familiar with some of the terms I used and needed to see them spelled out.

Presenting With Sign Language Interpreters

This was my first time presenting with sign language interpreters to help communicate with my audience. Since the pre-con session lasted multiple hours, there were two interpreters in my room. They would switch places about every hour. They were kind enough to answer a few questions for me during breaks.

I asked them if it was difficult to sign all the technical terminology used and if they tried to study up on terms ahead of time. One of them told me that they don’t study the subject and they fingerspell all the technical terms. Most of my terms were spelled on my slides, and I saw the interpreter look at the slide to get the spelling. When someone asked a question about the font I was using, the interpreter asked me to spell it out, since it wasn’t written anywhere. I asked if having printed slides helped (I provided PDFs of the slides to the attendees at the beginning of the session). One of the interpreters told me no, because they were already watching the signer for questions and watching my slides and listening to me.

What I loved most about having the interpreters there was that the person using the service got to fully participate in the session. They asked questions and made comments like anyone else. And they participated in hands-on small group activities.

Check out this great photo of one of the interpreters in action during a small group activity.

5 people sit in a group at a table while a sign language interpreter sits across the table and helps the group communicate
Photo of small group activities during my Power BI pre-con with a sign language interpreter in the group. Photo by Angela Tidwell

Having ASL interpreters didn’t require any extra effort on my part. I didn’t have to practice with them beforehand or provide them with any of my conference materials. They were great professionals and were able to keep up with me through lecture, demos, small group exercises, and Q&A.

Sign language interpreters cost money. And they should – they provide a valuable service. In this case, the interpreters were provided by the State of Texas because the person using the service worked for the state government. Because this was training for their job, the person’s employer was obligated to provide this service. So we were lucky that it didn’t cost us anything.

While the SQLSaturday organizers were coordinating the ASL interpreters, they found out that there is a fund in Texas that can help with accessibility services when a person’s employer doesn’t/can’t provide them. It may not be the same in every state, but it’s definitely something to look into if you need to pay for interpreters for an event like this.

Make Your Next Event More Accessible

I have organized events, and I understand the effort that it requires. I’m so happy that Angela and Mike made the effort to make SQLSaturday Austin-BI a more inclusive event. I would like to challenge you to do the same for the next event you organize or the next presentation you give at a tech conference.

Your conference may not be able to afford the Epiphan Pearl (note: the original model we used is discontinued, but there is a new model) and the Azure costs. I’d like to see SQLSaturdays join together and purchase equipment and share across events – it would be great if PASS would help with this. Or maybe a company involved in the community could sponsor them? If we can’t do that, we could always start small with the built-in capabilities in PowerPoint and work our way up from there.

It was a great experience as a speaker and as an audience member to have the live captions. And I was so happy that someone wanted to attend my session and was making the effort to sign up and request the ASL interpreters. I hope we see more of that in the future. But we need to do our part to let people know that we welcome that and we will work to make it happen.

Azure, Azure Data Factory, Microsoft Technologies

Parameterizing a REST API Linked Service in Data Factory

We can now pass dynamic values to linked services at run time in Data Factory. This enables us to do things like connecting to different databases on the same server using one linked service. Some linked services in Azure Data Factory can be parameterized through the UI. Others require that you modify the JSON to achieve your goal.

Recently, I needed to parameterize a Data Factory linked service pointing to a REST API. At this time, REST APIs require you to modify the JSON yourself.

In order to pass dynamic values to a linked service, we need to parameterize the linked service, the dataset, and the activity.

I have a pipeline where I log the pipeline start to a database with a stored procedure, lookup a username in Key Vault, copy data from a REST API to data lake storage, and log the end of the pipeline with a stored procedure. My username and password are stored in separate secrets in Key Vault, so I had to do a lookup with a web activity to get the username. The password is retrieved using Key Vault inside the linked service. Data Factory doesn’t currently support retrieving the username from Key Vault so I had to roll my own Key Vault lookup there.

Data Factory pipeline containing a stored procedure, web activity, copy activity, and stored procedure
Pipeline with a parameterized copy activity

I have parameterized my linked service that points to the source of the data I am copying. My linked service has 3 parameters: BaseUrl, Username, and SecretName. The JSON for my linked service is below. You can see that I need to reference the parameter as the value for the appropriate property and also define the parameter at the bottom.

{
    "name": "LS_RESTSourceParam",
    "properties": {
        "annotations": [],
        "type": "RestService",
        "typeProperties": {
            "url": "@{linkedService().BaseUrl}",
            "enableServerCertificateValidation": true,
            "authenticationType": "Basic",
            "userName": "@{linkedService().Username}",
            "password": {
                "type": "AzureKeyVaultSecret",
                "store": {
                    "referenceName": "MyKeyVault",
                    "type": "LinkedServiceReference"
                },
            "secretName": "@{linkedService().SecretName}"
            }
        },
        "parameters": {
            "Username": {
                "type": "String"
            },
            "SecretName": {
                "type": "String"
            },
            "BaseUrl": {
                "type": "String"
            }
        }
    }
}

I have defined these three parameters in my dataset, along with one more parameter that is specific to the dataset (that doesn’t get passed to the linked service). I don’t need to set the default value on the Parameters tab of the dataset.

4 parameters defined in a data factory dataset: relativeURL, username, secret, and baseURL.
Parameters defined in the dataset

On the Connection tab of the dataset, I set the value as shown below. We can see that Data Factory recognizes that I have 3 parameters on the linked service being used. The relativeURL is only used in the dataset and is not used in the linked service. The value of each of these properties must match the parameter name on the Parameters tab of the dataset.

Connection tab of the dataset in data factory, showing 3 linked service properties and one additional dataset property.
Setting the properties on the Connection tab of the dataset

In my copy activity, I can see my 4 dataset parameters on the Source tab. There, I can write expressions to provide the values that should be passed through to the dataset, 3 of which are passed through to the linked service. In my case, this is a child pipeline that is called from a parent pipeline that passes in some values through pipeline parameters which are used in the expressions in the copy activity source.

The Source tab of the copy activity. It uses the parameterized dataset and contains expressions to set the values of the parameters.
Defining the expressions for the dataset properties on the copy activity source

And that’s it. I can run my pipeline and have it call different REST APIs using one linked service and one dataset.

Conferences, Data Visualization, Microsoft Technologies, Power BI, SQL Saturday

New Power BI Report Design Pre-Con in 2020

I’m excited to announce that I will be offering a full-day pre-con about Power BI report design in the coming year called Bookmarks, brain pixels, and bar charts: creating effective Power BI reports. For a full session description and prerequisites, please visit the session page.

Screenshot of the eventbrite page for the pre-con at SQLSaturday Austin

I built this pre-con to help people better approach report design as an interdisciplinary activity where we are communicating with humans, not just regurgitating data or putting shiny things on a page. There are many misconceptions out there about report design. Some people see it as just a “data thing” that only developers do. Many BI developers avoid it and try to focus on what they consider to be more “hardcore data” tasks. I often hear from people that they can’t make a good report because they aren’t artistic. This hands-on session will dispel those misconceptions and help you clarify your definition of a good Power BI report. You will see how you can apply some helpful user interface design and cognitive psychology concepts to improve your reports. And you’ll leave with tips, tricks, and a list of helpful resources to use in your future report design endeavours.

Your report design choices should be intentional, not haphazard or just the Power BI defaults. We’ll review guidelines to help you make good design choices and look at good and bad examples. And we’ll spend some time as a group creating a report to implement the concepts we discuss.

Basic familiarity with Power BI is helpful for attendees. If you know how to add a visual to a report page, populate it with data, and change some colors, that’s all you need. If you feel like you lack a good process for report design to ensure your reports are polished and professional, this session will share an approach you can adopt to help accomplish your design and communication goals. If you feel like your reports are luckluster or not well-received by their intended audience, join me to learn some tips to improve. If you are a more experienced report designer and you want to learn some new techniques and see the latest Power BI reporting features, you’ll find that information in this session as well.

So far, I’m scheduled to present this session at two SQLSaturdays in Q1 2020:

SQLSaturday Austin – BI – February 7, 2020. Please register on Eventbrite.

SQLSaturday Chicago – August 14, 2020. Please register on Eventbrite.

SQLSaturday pre-cons are very reasonably priced. This is a great way to get a full day of training on a low budget! I hope to see you in Austin or Chicago.

Accessibility, Data Visualization, Microsoft Technologies

How a new custom PowerPoint template is helping us to be more effective presenters

DCAC recently had a custom PowerPoint template built for us. We use PowerPoint for teaching technical concepts, delivering sales and marketing presentations, and more. One thing I love about working at DCAC is that each of the 6 consultants is also a speaker at conferences. So we all care about making presentation content understandable and memorable.

In addition to the general desire to be an effective presenter, I have a special interest in accessible design. I speak and consult on accessibility in Power BI report design. So I feel a responsibility to try to be accessible in all areas of visual design. In addition, I know that making things more accessible tends to increase usability for everyone.

PowerPoint templates in general reduce manual efforts to assign colors, set fonts and font sizes, and implement other design properties such as animations. I’m all for making a decision once, templatizing it, and making it easy to instantiate. But most of the templates available within PowerPoint and sold online don’t follow good presentation design practices. That is to say, the formatting gets in the way of the content.

Having a template made

Anyone can make a PowerPoint template. You just have to learn how master slides work. Lots of marketing and graphic design companies offer to create PowerPoint templates. But they are mostly concerned with staying on brand or making something shiny. And people make presentations because we want to communicate information. I wanted our template to make it easy for us to follow some good design practices, including accessibility.

So I reached out to my friend Echo Rivera who has her own company, Creative Research Communications, that specializes in making visually engaging presentations about highly technical topics. I came across one of her tweets a couple of years ago and visited her website. Some of the things I have learned from Echo’s courses and blog posts include:

Echo worked with me to make a template that followed good design practices but was flexible enough for a variety of content and audiences. And we also made the template fairly accessible! I’m pleased with the way our template turned out. A template can’t do everything for you, but it can give you a good start.

The process of presentation template design

I found the process of designing the template interesting, so I thought I would share a bit about that.

First, I sent Echo some samples of the presentations we give and some ideas on a color palette. We met to discuss our corporate personality, our presentation needs, and our goals for the template. Then a few members of the DCAC team helped me pick some potential images for the title slide.

Next we picked icons and colors for sections that we thought we would commonly include in presentations. We stuck with a white background on some content slides to make it easier to use images and diagrams from online (so we wouldn’t have to expend effort removing image backgrounds when they weren’t transparent). It also made it easier to choose colors with good contrast.

Once we got the full draft of the template, the team reviewed it and tried it out on some slides. And because things were going too smoothly, we decided we didn’t like the image we selected for the title slide, and chose another one. We made a few adjustments to default font sizes so that slides would fit our company name and slide headings with long technology names on a single line. We also adjusted the position of the slide heading text on slides with the line and icon at the top (shown below) so the text didn’t sit directly on the line.

We made a version of the slide template that has a small single-color logo on each slide. I plan to use this for sales slides and other situations where we need to ensure our name is on a slide when taken out of context (like a screenshot of a single slide). But the logo is small and not distracting so it doesn’t negatively impact design.

We made the template expecting that team members would adjust the master slides and use the provided multi-purpose layouts as needed for each presentation. We will change icons and use blank slides to add diagrams and big images. We also have multiple options for title and thank you slides.

Benefits of our new PowerPoint template

There is a lot to the new template, and examples below are shown without a ton of background explanation. But I couldn’t write this blog post and not show some examples, and I didn’t want this post to turn into a novel.

Our slides use color contrast to create visual interest while staying close to our corporate color palette.

The spacing and font sizes allow for about 6 lines of text max in an all-text slide. Our default font size for text content is 32pt.

Content slide with large text and clean formatting
Content slide from Meagan’s presentation on accessible Power BI report design

We used icons and color to denote sections in our presentations.

I think the best thing that this process has done is encourage us to use less text and more images. Check out the recently created slides from one of John’s presentations below.

My checklist for accessibility in visual presentation design includes:

  • font sizes no smaller than 24pt
  • color contrast that meets WCAG standards (3:1 for large text, 4.5:1 for smaller text)
  • avoiding using color as the only means of conveying information
  • avoiding color combinations that are problematic for people with color vision deficiency

Echo worked patiently and diligently with me to measure and adjust colors to meet these criteria. It was great to have someone that understood these goals.

Feedback from other team members

To be honest, I think it took some time for the template to grow on people. But most of us have warmed to it. If you are used to small images, lots of text, and extensive use of clip art and smart art, switching to the template can be a bit difficult. But that’s kind of the point.

Echo gave us some videos on how to use the template and some slide makeover examples when she delivered the template, and I asked the team to watch this TEDx talk. That reinforced the design goals and choices.

I asked the team for comments, and this is what they said.

“I like having a professional-looking, themed template that isn’t just something out of the box that looks like everyone else’s. The template, plus the helpful background info force you to think about what you are puking onto a single slide. It will lead to some semi-major rewrite/re-evaluation of about every presentation I do.”

Kerry

“The template, in part with its notes and reminders, helps me to ensure that I put the proper amount of context on a given slide.  It’s a balancing act with font size and content.  If I have to reduce the font size below 32 pt and I still have content to provide, it goes on another slide.”

John

Final thoughts

Will every slide we make be perfect and beautiful? No. But this template will help us avoid common mistakes like having too much text on the slide, and it provides some built-in accessibility without us having to remember to make adjustments.

I really enjoyed working with and learning from Echo, and I recommend her for your presentation design needs. I want to note that she prefers that you take her training before she does custom design work for you so you have a mutual design foundation to build upon. But her courses are great, so that’s not a hardship.

It’s also a privilege to work with people that share and support my goals of effective presentation delivery and accessible design. PowerPoint templates are part of the branding and marketing collateral in an organization, and I love what this says about DCAC.

Accessibility, Conferences, Microsoft Power Apps, Microsoft Technologies, Power BI

I’m Speaking at Microsoft Ignite 2019

I’m happy to be speaking at Microsoft Ignite this year. I have an unconference session and a regular session, both focused on accessibility in the Power Platform.

The regular session, Techniques for accessible report design in Microsoft Power BI, will be Wednesday, November 6 at 2:15pm. In this session I’ll discuss the features available in Power BI for making accessible reports and demonstrate techniques for making your reports easier to use. This session will be recorded, so if you can’t make it to Ignite, you can catch it online.

My unconference session, Accessibility in the Microsoft Power Platform, is a chance to have a discussion about accessibility in Power BI and PowerApps. It will be held on Thursday, November 7 at 10:45 am. Unconference sessions at Ignite include facilitor-led discussion and exercises that encourage audience participation where everyone can share their experiences and opinions. If you will be at Ignite and want to share struggles or successes in improving accessibility or raising awareness of accessibility issues, please join me.

This year at Ignite there is a reservation system for unconferences. You can RSVP while you are building your schedule on the website. Walk-ins will be accepted just before the session, assuming there is room. But please RSVP if you want to be sure to get a seat in an unconference session. Unconference sessions are not recorded, so this will be an in-person session only. But I will post materials through the Ignite website once the session is over.

If you will be at Ignite, please stop by and say hello and meet Artemis the Power BI accessibility aardvark.

Data Visualization, DCAC, Microsoft Technologies, Power BI

Power BI for Communication and Marketing

We often focus on deep analysis and insights generated by machine learning when we talk about Power BI these days because it’s super cool and very fancy. But I think it’s important to remember that you can also use Power BI for simple communication of data. As humans, we are hard wired to process visual information faster than text alone. And it’s often more efficient and concise to convey information in a visual rather than text. We can show in one image what it takes paragraphs to describe. Outside of historical analysis and operational reporting, we can use data visualizations as an engaging way to convey simple facts in a communication or marketing effort. Many people do this with infographics.

Visualization can make simple numbers and facts more memorable and engaging. I enjoy using Power BI for this marketing communication purpose, so I made a report about the people I work with at Denny Cherry & Associates Consulting. I knew most of them before I started working at DCAC, so I was already aware that they were exceptional. But now you can see it, too, with our new Power BI report located at https://www.dcac.com/about-us/learn-more-about-us.

Screenshot of the home page of our About DCAC Power BI report

Data is some people’s preferred language. We can use it to discuss specific points as well as broader trends and comparisons. I have a sticker on my laptop that says “Please Talk Data To Me” for a reason. I can make sweeping statements about how my team is full of accomplished consultants, speakers, and authors. But it is much more impactful when you see the numbers: 27 Microsoft MVP awards and 8 VMWare VExpert awards over the years, a combined 113 years of experience with SQL Server, 86% of us having been a user group leader.

My Surface Pro tablet with a “Please Talk Data To Me” sticker on it

And because Power BI is interactive, you get to interact with my report, choosing which categories you want to learn more about. There is also a little guessing game built into the report. (Edit: Although there is currently a Power BI bug that I had to work around, so it’s helpful to know that the answers are limited to values 1-2500). But that is a blog post for another time, and for now you get a hint as to possible answers.) If I had just told you these stats, you might have listened, but you probably would have tuned out.

Our team even had fun browsing the report, seeing our collective achievements, and guessing who provided which answers.

Building the report

It was quick and easy to use Microsoft Forms to gather the data and then dump the responses to Excel. From there, I made a quick Power BI data model and put together my visualizations. The report is embedded on our website using a Publish to Web link. A few people have said they would like to do something similar for their organizations, so I’ll offer a couple of tips:

  1. Make sure the questions you ask aren’t violating any HR rules, and that employees know that some or all questions are optional. For example, I asked personal questions in the Fun Facts section about number of children employees have. This question was optional and our team felt comfortable answering it.
  2. Try to group your questions into categories so you can easily the separate sections and pages like I did. Then you can associate a color with each category.

If you do make something similar, I’d love to see it. Tweet me a link or screenshot at @mmarie.

Azure, Azure SQL DB, Microsoft Technologies, SQL Server

New Centralized View of SQL Resources in Azure

Yesterday some new views were made available in the Azure portal that will be helpful to those of us who create or manage Azure SQL resources.

First, a new guided approach to creating resources has been added to the Azure portal. We now have a unified experience to create Azure SQL resources that offers guidance as to the type of Azure SQL resource you need for your use case: SQL database, managed instance, or SQL Server virtual machine. This new Azure SQL blade under Marketplace offers a high-level description of each offering and the scenario that it best serves. If you already know what you want but are having trouble remembering exactly what the resource is called in the marketplace, this can also alleviate that issue.

New guided experience for creating SQL Azure resources in the Azure Portal
New guided experience for creating SQL Azure resources in the Azure Portal

Notice that SQL virtual machine images are a listed offering in the new experience. As Microsoft phrased it, “SQL Server on Azure VMs is now a first-class member of the Azure SQL family.” This blade gives you an easily accessible place to see all the SQL Server VM images without having to search through lots of other unrelated VM images.

A drop-down box listing all of the available SQL Server VM images
A drop-down box listing all of the available SQL Server VM images

Once your Azure SQL resources are created, you can use the new centralized management hub to administer them. Locate the Azure SQL resources blade to see a list of all of your single databases, database servers, elastic pools, managed instances, and virtual machines running SQL.

Centralized management hub for Azure SQL resources
Centralized management hub for Azure SQL resources

This is the foundation for a unified database platform in Azure with more consistency across offerings and more manageability features to come in the future. For more information, read the announcement from Microsoft or watch the new video they posted on Channel 9.

Azure, Microsoft Technologies, PowerShell, SQL Server

Using Azure Automation to Shut Down a VM only if a SQL Agent Job is Not Running

I have a client who uses MDS (Master Data Services) and SSIS (Integration Services) in an Azure VM. Since we only need to execute the SQL Agent job that runs the SSIS packages infrequently, we shut down the VM when it is not in use in order to save costs. We wanted to make sure that the Azure VM did not shut down when a specific SQL Agent job was still running, so I tackled this with some PowerShell runbooks in Azure Automation.

I split the job into two parts. The first runbook simply checks if a specified SQL Agent job is running and returns a text value that indicates whether it is running. A parent runbook checks if the VM is started. If the VM is started, it calls the child runbook to check if the job is running, and then shuts down the VM if the job is not running.

It’s fairly easy and convenient to have nested PowerShell runbooks in Azure Automation. There are two main ways to call a child runbook.

  1. Inline execution
  2. Using the Start-AzureRmAutomationRunbook cmdlet

It was less obvious to me how to call a child runbook when the parent runs in Azure and the child runs on a hybrid worker, especially when you need to use the output from the child runbook in the parent. A hyrid runbook worker allows us to access resources that are behind a VNET or on premises.

Travis Roberts has a nice video on just this topic that gave me the answers I needed.

Below is my parent runbook.

# Ensures you do not inherit an AzureRMContext in your runbook
Disable-AzureRmContextAutosave –Scope Process

$connection = Get-AutomationConnection -Name AzureRunAsConnection
Connect-AzureRmAccount -ServicePrincipal -Tenant $connection.TenantID `
-ApplicationID $connection.ApplicationID -CertificateThumbprint $connection.CertificateThumbprint

$rgName='MyResourceGroup'
$vmName='MyVM'
$SubID = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'

$AzureContext = Select-AzureRmSubscription -SubscriptionId $SubID
'Check if VM is on'
$vm=((Get-AzureRmVM -ResourceGroupName $rgName -AzureRmContext $AzureContext -Name $vmName -Status).Statuses[1]).Code
 $vm 
 if ($vm -eq 'PowerState/running')
 {
    Do 
    {
        #if VM running call other runbook
        start-sleep -Seconds 60;
        'Check if job is running'
        $JobRunning = start-azureRMautomationrunbook -AutomationAccount 'ProgramsAutomation' -Name 'CheckRunningSQLJob' -ResourceGroupName $rgName -AzureRMContext $AzureContext -Runon 'Backups' -Wait;
        Write-Output $JobRunning
        
        
    } Until ($JobRunning -eq 'run0')
    
    'Stopping VM'
    stop-azurermvm -Name $VMname -ResourceGroupName $RgName -force
}

The runbook sets the Azure context to the appropriate subscription (especially important when you are a guest user in someone else’s tenant). Then it checks if the VM is started. If it is, it goes into a do-while loop. This task isn’t super time sensitive (it’s just to save money when the VM isn’t in use), so it’s waiting 60 seconds and then calling the child runbook to find out if my SQL Agent job is running. This makes sure that the child runbook is called at least once. If the result is that the job is not running, it stops the VM. If the job is running, the loop starts over, waiting 60 seconds before checking again. This loop is essentially polling the job status until it sees that the job is completed. One thing to note is the -Wait parameter on the end of that Start-AzureRmAutomationRunbook command. If you don’t specify the -Wait parameter, the command will immediately return a job object. If you specify the -Wait parameter, it waits for that child job to complete and returns the results of that job.

And here is my child runbook.

[OutputType([string])]

$SQLJobName = 'MySQLAgentJobName'
$SQLInstanceName = 'MySQLServer

$cred=Get-AutomationPSCredential -Name 'mycredential'
 
$server = Connect-DbaInstance -SqlInstance $SQLInstanceName -SqlCredential $cred
 
Get-DbaRunningJob -SqlInstance $server | Get-DbaRunningJob

$JobStatus = (Get-DbaRunningJob -SqlInstance $server).Name -match $SQLJobName

If ($JobStatus -ne $false) 
{
#job is running. Passing back a string because bits and ints were causing issues.
    $JobRunning = 'run1'
    Write-Output $JobRunning 
}
else 
{ 
#job is idle
    $JobRunning = 'run0'
    Write-Output $JobRunning 
}

I’m using dbatools to check if the job is running on the server. That is the Get-DBARunning Job command. The important part to note is that you have to use the Write-Output command for this output to be available to the parent runbook. I got some weird results when I tried to return an int or a boolean (it was returning an object rather than a single value), so I just went with a string. The string, while not the most efficient, works just fine. If you understand why this is, feel free to leave me a comment.

These runbooks have been in place for a couple of months now, and they are working great to shut down the VM to save money while making sure not to disturb an important SQL Agent job that might occasionally run late. I didn’t find much documentation nor many examples of using output from a child job that runs on a hybrid worker, so I wanted to get this published to help others that go searching.

Data Warehousing, Microsoft Technologies, SSIS

My Preferences for SSIS Design

Lately, I have been using SSIS execution frameworks and Biml created by other people to populate data marts and data warehouses. It has taught me a few things and helped me clarify what I like and dislike compared to my usual framework. I’ve got the beginning of my preferences list started below. There are probably situations where I would want to deviate from my preferences, but I think they make a good starting point.

Populating Data

  • For self-service BI environments, a date dimension that doesn’t go out much further than the greatest date in your data. This can be a view or stored procedure that limits and updates dates rather than a static date dimension that goes out until the end of time.
  • Unknown values are included in normal dimension loads, not in separate scripts that must be run on deployment. This way, if an unknown value is ever left out or deleted, it will be added in the next data load rather than requiring a special execution of a script.
  • Every table should have InsertDateTime and UpdateDateTime columns. The UpdateDateTime column should be populated with the same value as the InsertDateTime column upon creation of the row, rather than being left null.
  • Whatever you use to create tables, include primary keys, foreign keys, and indexes with your table definitions. Provide explicit constraint names to simplify database comparisons. You can disable your foreign keys, but they need to be there to provide that metadata.
  • Separate your final dimensional/reporting tables from audit tables and staging tables. This can be done with separate schemas or even separate databases.

Data Integration Process

  • There should be consistent error handling in each layer (staging, dims, facts, etc.). If you write errors to another location (flat file, database table), have a process that notifies the right people that errors occurred. The process of consuming corrected data must be built, tested, and integrated into the existing process.
  • Make your error handling process reflect what end users need to see when an error occurs. Does it make sense to have a partial load when there is an issue? Or should it be all or nothing?
  • Have smart master packages that determine which packages to run. Don’t check whether the package should run inside of the package itself – do that in the master package.
  • Master packages should execute child packages in parallel as much as possible rather than defaulting to sequential execution.
  • Have an audit log with one row per package. Include the SSIS ServerExecutionID in the audit log – not the package -specific ID but the execution ID for the entire run. If there are incremental loads, the where clause used to filter the load should be captured in the audit table. Include row counts as well as package start and stop time in your audit log.
  • Add an AuditLogID column on your dimension, fact, and staging tables so you can trace each row back to the process that populated it.
  • For dims and facts, perform change detection/deduplication of records, usually through hash values and either SSIS lookups or SQL queries with WHERE NOT EXISTS.
  • Avoid T-SQL MERGE statements. Write individual insert/update/delete statements. This avoid any bugs in MERGE and makes your SQL easier to understand and troubleshoot.
  • Use consistent naming of tasks, source, destinations, packages, connection managers, etc. Connection managers pointing to databases should have names that refer to the database rather than the server.
  • If you are downloading files, move the files to an archive folder once files are processed. You can have rules in place if you have retention limits. But you probably need to keep files from at least the last load for audit and troubleshooting purposes. This could change if you are importing very sensitive data.
  • Even if you need to copy all columns from a table, write a select statement for database sources that explicitly names fields rather than using SELECT *. or just selecting the table or view.
  • SSIS lookups should use an explicit query rather than referencing an entire table.
  • Implement restartability at the package level for most packages (you should have single-purpose packages executed by a master package). Checkpoints are ineffective within a package. If you build your audit log table correctly, you can get the list of packages that have not run in the last X minutes/hours and feed that to your master package.
  • Send email from your scheduling tool rather than within an SSIS package.
  • Track data lineage in your tables. This can be as simple as having a table that lists all of your data sources with an ID column and including that ID value in each row of your staging, fact, and dimension tables.
  • Dims and facts are not truncated. Data should be inserted and updated (and deleted, if necessary).
  • Connection strings used in multiple packages should be project-level connection strings.

Biml Specifics

  • Understand whether you need a flexible Biml Framework or just an accelerator for a current project. If you need flexibility, don’t hardcode connection strings and other things that change when you add/change sources and destinations. If you just need to accelerate development of a simple data mart, total flexibility may be overkill and actually cause more work.
  • Have a single place where you add synthetic metadata, as much as possible. BimlScript gets messy and difficult to understand when you have some extended properties that are read in, some annotations added directly, and some variables defined in your code. This is why I like synthetic metadata stored in a database. Also, extended properties don’t exist in Azure SQL Data Warehouse, so if you need your framework to work there you can’t go that route.
  • Don’t repeat your code in multiple files. If you have some logic that gets reused, move it to a separate file and reference it from other files.

What Do You Think?

What’s on your SSIS preferences list? Do you disagree with one of my preferences and want to share your knowledge? Let’s chat in the comments.