Azure, Azure Data Factory, Microsoft Technologies

Azure Data Factory Activity Failures and Pipeline Outcomes

Question: When an activity in a Data Factory pipeline fails, does the entire pipeline fail?
Answer: It depends

In Azure Data Factory, a pipeline is a logical grouping of activities that together perform a task. It is the unit of execution – you schedule and execute a pipeline. Activities in a pipeline define actions to perform on your data. Activities can be categorized as data movement, data transformation, or control activities.

In many instances, when an activity fails during a pipeline run, the pipeline run will report failure as well. But this is not always the case.

There are two main scenarios where an activity would report failure, but the pipeline would report success:

  • The maximum number of retry attempts is greater than 0, and the initial activity execution fails but the second attempt succeeds
  • The failed activity has a failure path or a completion path to a subsequent activity and no success path

Retry Attempts

In the General settings of any activity is a property called Retry. This is the number of times Data Factory can try to execute the activity again if the initial execution fails. The default number of retries is 0. If we execute a pipeline containing one activity with the default Retry setting, the failure of the activity would cause the pipeline to fail.

Data Factory Web UI  showing the General settings of an activity with the Retry property
Data Factory Activity General settings showing the Retry Property

I often set retries to a non-zero number in copy activities, lookups, and data flows in case there are transient issues that would cause a failure that might not be present if we waited 30 seconds and tried the activity again.

Data Factory Monitoring activity runs within a pipeline. An activity failed the first time, was rerun, and succeeded the second time
Output of a Data Factory activity that was executed and initially failed. Since it was set to have 1 retry, it executed again and succeeded. If nothing else in the pipeline failed, the pipeline would report success.

Dependency with a Failure Condition

Activities are linked together via dependencies. A dependency has a condition of one of the following: Succeeded, Failed, Skipped, or Completed. If we have a pipeline containing Activity1 and Activity2, and Activity2 has a success dependency on Activity1, it will only execute if Activity1 is successful. In this scenario, if Activity1 fails, the pipeline will fail.

Activity1 has a success path to Activity2. Activity1 failed so Activity2 did not execute.
Because Activity1 failed, Activity2 is not executed and the pipeline fails.

But if we have a pipeline with two activities where Activity2 has a failure dependency on Activity1, the pipeline will not fail just because Activity1 failed. If Activity1 fails and Activity2 succeeds, the pipeline will succeed. This scenario is treated as a try-catch block by Data Factory.

Activity1 has a failure path to Activity2. Activity1 failed and Activity2 succeeded.
The failure dependency means this pipeline reports success.

Now let’s say we have a pipeline with 3 activities, where Activity1 has a success path to Activity2 and a failure path to Activity3. If Activity1 fails and Activity3 succeeds, the pipeline will fail. The presence of the success path alongside the failure path changes the outcome reported by the pipeline, even though the activity executions from the pipeline are the same as the previous scenario.

Activity1 has a success path to Activity2 and a failure path to Activity3. Activity1 failed, Activity2 was skipped, and Activity3 succeeded.
Activity1 fails, Activity2 is skipped, and Activity3 succeeds. The pipeline reports failure.

What This Means for Monitoring

This difference between pipeline and activity status has a few implications of which we should be aware as we monitor our data factories.

If we are using Azure Monitor alerts, we need to understand that setting an alert for pipeline failures doesn’t catch all activity failures. If there is a retry of an activity and the second attempt is successful, there would be an activity failure but no pipeline failure.

Conversely, if we set an alert to notify us of activity failures, and we have a pipeline designed with the try-catch pattern, we might get an alert about an activity failure, but the pipeline would still show success. You would need to look at the status of the activities within the pipeline execution to see the failure of which you were alerted.

For many of my implementations, just setting an alert to notify me when any activity failure occurs is fine. For others, I really only care if the pipeline fails. Sometimes I need to set more specific alerts where I choose only certain activities to monitor for failure.

You could also use the Data Factory SDK to roll your own monitoring solution. If you write PowerShell, C#, or Python, you can retrieve the status of any pipeline or activity run and take subsequent actions based upon the results.

What This Means for Pipeline Design

You may need to add activities to your pipelines to support your monitoring scenarios if you need something more customized than what is offered from Azure Monitor and don’t want to use the SDK.

If you have notification needs that Azure Monitor can’t accommodate, you could add an activity in your pipelines to send an email based upon your desired activity outcomes. You can cause that activity to execute using an activity dependency alone, or by combining it with a variable and an If Condition activity.

There are times where we may need a pipeline to fail even though we are using the try-catch pattern that results in pipeline success. In that case, I add an additional web activity to the end of my pipeline failure path that hits an invalid url like http://throwanerror.  The failure of this activity will cause the pipeline to fail. Keep monitoring and notifications in mind as you design your pipelines so you are alerted as appropriate.

Azure Data Factory Activity and Pipeline Outcomes

To help clarify these concepts I made the below guide to Data Factory activity and pipeline outcomes. Feel free to share it with others. You can download it directly from this link. A text version that should be friendlier for screen readers can be found on this page.

Accessibility, Data Visualization, Microsoft Technologies, Power BI

Zooming In on a Power BI Report

Have you ever tried to use your browser to zoom in on a visual in a Power BI report? If you simply published your report and then zoomed in, you might have experienced something like the video below.

Trying to zoom in on a report that is set to Fit to page can be confusing for users.

With the default settings of the report, when you zoom in, only the menus around the report change. This is because of report responsiveness and the View setting. By default, reports are set to Fit to page. Power BI is refitting the report to the page every time you zoom.

Why would we need to zoom in?

There might be accessibility or compliance reasons to allow people to zoom in. For instance, WCAG 2.1 Success Criterion 1.4.4 states “Except for captions and images of texttext can be resized without assistive technology up to 200 percent without loss of content or functionality.” People with low vision or other vision impairments might benefit from the ability to zoom within a report page.

Another reason might be that a user simply wants to focus on one chart at a time. Power BI does have a Focus mode. Unfortunately, it currently does a poor job of increasing the font sizes on the visual that is in focus, often rendering it unhelpful.

Column chart shown in Focus Mode in Power BI with large bars and tiny text
Power BI visual shown in Focus Mode

Edit: A helpful commenter pointed out that you can zoom in and out while in Focus mode. This works pretty well on many (but not all) visuals.

What Are Our Other Options?

There are a couple of workarounds for users who need to zoom in on visuals.

  1. We can set the report view or teach users to set the report view to Actual size. This then allows the browser zoom to work as anticipated. We probably don’t want to set all our reports to actual size because we would lose valuable screen real estate and diminish the experience for some users who don’t need to zoom. Having the report automatically fit to the user’s screen is usually helpful. But if users can change that setting as they need too, that might be ok. Here’s an example of how that works.
Setting the view on the Power BI report to Actual size allows users to zoom with the browser

2. We can use assistive technology to zoom. Both Windows and MacOS have built-in magnifier functionality. The downside to this is that using it would not satisfy WCAG 2.1 Success Criterion 1.4.4. I think there is still some gray area/lack of expertise as far as how people are making data visualizations WCAG compliant because it’s part text and part image/shape (although it’s not rendered on the page as an image in Power BI). I’m usually more concerned that users get the information they need an have a good experience. But I want to note this in case you are trying to be WCAG compliant and might run into this issue. Here’s an example of using the magnifier in Windows. You can still use the interactivity in the report. And you can change the size of the magnification window and the level of magnification.

The Windows Magnifier allows users to zoom in to part of the report page while retaining interactivity

3. Zooming in on the report page with a touch screen works fine. If users have tablets or laptops with a touch screen, they can use their fingers to zoom and it will behave as expected. Here’s a video that shows that experience.

Those are all the workarounds I’m aware of, but I’m interested to hear how you have worked around this issue. If you have other suggestions please leave them in the comments.

I found an existing idea about increasing the text size within visuals in focus mode on Ideas.PowerBI.com. I’ve added my vote to it, and I hope you’ll do the same.

Azure, Azure Data Lake, Microsoft Technologies, Power BI

Granting ADLS Gen2 Access for Power BI Users via ACLs

It’s common that users only have access to certain folders in an Azure Data Lake Storage container. These permissions are provided not through Azure RBAC (role-based access control) roles but through POSIX-like ACLs (access control lists).

The current Power BI documentation mentions only Azure RBAC roles, but it is possible to connect to a folder with permissions granted through ACLs.

You can manage ACLs through the Azure Storage Explorer application or in the Storage Explorer preview in the Azure Portal. As an example, I have a storage account with the hierarchical namespace enabled. In the container named filesystem1 is a folder called Test. Test contains 3 files, and I want a user to import Categories.csv into Power BI.

Azure Storage Explorer showing the mmldl storage account with filesystem1 selected. The Test folder in filesystem1 is selected and 3 files are shown.
Data lake storage account with files located in a folder called Test

If I select the Test folder and then select Manage Access, I can see that an AAD user named Data Lake User has been granted access and default ACLs. Note that the user needs at least Read and Execute. Write isn’t necessary if they don’t need to change the file.

The Manage Access window in Azure Storage Explorer. The user named Data Lake User is selected. Access and Default permissions are set to give the user Read, Write, and Execute.
Managing access on the Test folder for the Data Lake Access user

But with those permissions on the Test folder, I’m not able to connect to it from Power BI Desktop. If I try, I’ll get an error that says “Access to the resource is forbidden.”

Power BI error that says "Unable to connect. We encountered an error while trying to connect. Details: Access to the resource is forbidden."
Power BI error encountered when a user doesn’t have sufficient permissions to access a file in the data lake

This is because the user is missing some permissions. We need to grant Execute permissions on all parent folders up to the root (the container).

In this case, there is only one level above my Test folder. So I select the filesystem1 container, go to Manage Access, and grant it Execute permissions.

Manage Access window in Azure Storage Explorer showing permissions for Data Lake user on filesystem1. Execute is selected for both Access and Default permissions.
Adding Execute permissions to the parent container

Note that changing the Default ACL on a parent does not affect the access ACL or default ACL of child items that already exist. So if you have existing subfolders and files to which users need access, you will need to grant access at each parent level because the default ACLs won’t apply.

Thanks to Gerhard Brueckl for noting that I needed Execute permissions on parent folders when I got stuck in testing.

If you find yourself hitting that access forbidden message in Power BI when accessing a file in ADLS Gen2, double check the user’s Execute permissions on the parent folders.