Azure, Azure SQL DW, Data Warehousing, Microsoft Technologies, T-SQL

What You Need to Know About Data Classifications in Azure SQL Data Warehouse

Data classifications in Azure SQL DW entered public preview in March 2019. They allow you to label columns in your data warehouse with their information type and sensitivity level. There are built-in classifications, but you can also add custom classifications. This could be an important feature for auditing your storage and use of sensitive data as well as compliance with data regulations such as GDPR. You can export a report of all labeled columns, and you can see who is querying sensitive columns in your audit logs. The Azure Portal will even recommend classifications based upon your column names and data types. You can add the recommended classifications with a simple click of a button.

You can add data classifications in the Azure Portal or via T-SQL or PowerShell. Data classifications are database objects.

ADD SENSITIVITY CLASSIFICATION TO
    dbo.DimCustomer.Phone
    WITH (LABEL='Confidential', INFORMATION_TYPE='Contact Info')

To view existing data classifications, you can query the sys.sensitivity_classifications view or look in the Azure Portal.

SELECT
sys.all_objects.name as [TableName], 
sys.all_columns.name as [ColumnName],
[Label], 
[Information_Type], 
FROM sys.sensitivity_classifications
left join sys.all_objects on sys.sensitivity_classifications.major_id = sys.all_objects.object_id
left join sys.all_columns on sys.sensitivity_classifications.major_id = sys.all_columns.object_id
    and sys.sensitivity_classifications.minor_id = sys.all_columns.column_id

Be Careful When Loading With CTAS and Rename

One issue that is specific to using data classifications in Azure SQL DW is that it is possible to inadvertantly drop your classifications when you are loading your tables using the recommended T-SQL load pattern. Typically, when using T-SQL to load a dimensional model in Azure SQL DW, we perform the following steps:

  1. Create an upsert table via CTAS with the results of a union of new data from a staging table with existing data from the dimension table
  2. Rename the dimension table to something like Dimension_OLD
  3. Rename the upsert table to Dimension
  4. Drop the Dimension_OLD table
Animation of a table load process in Azure SQL DW


In the animation above, you’ll first see the load process as described, and then it will replay with sensitivity labels added to the dimension table. You’ll see that they are dropped when we drop the old dimension table. This makes sense because sensitivity classifications are objects related to that table. We would expect an index to be dropped when we drop the related table. This works the same way.

Check out my SQL notebook for a demonstration of the issue as well as my workaround that I describe below. If you spin up an Azure SQL Data Warehouse with the sample database, you can run this notebook from Azure Data Studio and see the results for yourself.

There are a few complicating factors:

  • There are currently no visual indicators of sensitivity classifications in SSMS or Azure Data Studio.
  • ETL developers may not have access to the data warehouse in the Azure Portal to see the sensitivity classifications there.
  • The entire process of adding and managing sensitivity classifications may be invisible to an ETL developer. A data modeler or business analyst might be the person adding and managing the sensitivity classifications. If the ETL developer isn’t aware classifications have been added, they won’t know to go and look for them in the sys.sensitivity_classifications view.
  • SSDT does not yet support sensitivity classifications. The only way I have found to add them into the database project is as a post-deployment script with the build property set to none.

The good news is that you can add the sensitivity classifications back to your dimension table using T-SQL. The bad news is still that the ETL developer must remember to do it. My workaround for now is a stored procedure that will do the the rename and drop of the tables plus copy the sensitivity classifications over. My hope is that it it’s easier to remember to use it since it will do the rename and drop for you as well.

Update: Someone asked about the name SwapWithMetadata and why it doesn’t specifically mention sensitivity classifications. I didn’t mention classifications because there are other things that need this same treatment. Dynamic data masking will also need to be reapplied. With dynamic data masking, it will be even more important to add it back immediately after swapping the tables rather than waiting for a full data load of all selected tables to finish and adding all classifications back. If your load takes a long time or the process fails on another table, you don’t want your data exposed without a mask to users who shouldn’t see the full information.

Eventually, the tools will be updated to provide more visibility to data sensitivity classifications, but we still need to make sure they don’t get dropped.

For now, my recommendation is if you are going to go in and add a lot of sensitivity classifications, that you create a user defined restore point immediately after so that you know you have them in a backup somewhere. Azure SQL DW doesn’t do point-in-time restores the way Azure SQL DB does. It takes automatic restore points every 8 hours or so. So if someone went through the trouble of adding the sensitivity classifications and they were dropped through the data load process, there is no guarantee that you could use a backup to get them back.

Vote for My Enhancement Idea

If you would like Microsoft to add something to the product to keep sensitivity classifications from being dropped, or at least make it easier to add them back, please vote for my idea.

Not an Issue with Other Data Load Methods

Please note that if you are using other tools or methods to load your tables where you don’t swap them out, you won’t have the issue of dropping your sensitivity classifications. But I wanted to bring up this issue because I can see people spending a lot of time adding them and then suddenly losing them, and I want everyone to avoid that frustration.

Give Data Classifications a Try

I think data classifications are a good addition to SQL DW. Anything that helps us efficiently catalog and manage our sensitive data is good. I have added them in my demo environment and hope to use them in a client environment soon.

Have you tried out data classifications in SQL DW or DB? What do you think so far? If not, what is keeping you from using them?

Azure, Azure Data Factory, Microsoft Technologies

There is Now A Delete Activity in Data Factory V2!

Data Factory can be a great tool for cloud and hybrid data integration. But since its inception, it was less than straightforward how we should move data (copy to another location and delete the original copy).

It is a common practice to load data to blob storage or data lake storage before loading to a database, especially if your data is coming from outside of Azure. We often create a staging area in our data lakes to hold data until it has been loaded to its next destination. Then we delete the data in the staging area once our subsequent load is successful. But before February 2019, there was no Delete activity. We had to write an Azure Function or use a Logic App called by a Web Activity in order to delete a file. I imagine every person who started working with Data Factory had to go and look this up.

But now Data Factory V2 has a Delete activity.

Data Factory V2 Copy activity followed by a Delete activity

How the Delete Activity Works

The Delete activity can delete from the following data stores:

  • Azure blob storage
  • ADLS Gen 1
  • ADLS Gen 2
  • File systems
  • FTP
  • SFTP
  • Amazon S3

You can delete files or folders. You can also specifiy whether you want to delete recursively (delete including all subfolders of the specified folder). If you want to delete files/folders from a file system on a private network (e.g., on premises), you will need to use a self-hosted integration runtime running version 3.14 or higher. Data Factory will need write access to your data store in order to perform the delete.

You can log the deleted file names as part of the Delete activity. It requires you to provide a blob storage or ADLS Gen 1 or 2 account as a place to write the logs.

You can parameterize the following properties in the Delete activity itself:

  • Timeout
  • Retry
  • Delete file recursively
  • Max concurrent connections
  • Enable Logging
  • Logging folder path

You can also parameterize your dataset as usual.

All that’s required in the Delete activity is an activity name and dataset. The other properties are optional. Just be sure you have specified the appropriate file path. Maybe try this out in dev before you accidentally delete your way through prod.

To delete all contents of a folder (including subfolders), specify the folder path in your dataset and leave the file name blank, then check the box for “Delete file recursively”.

You can use a wildcard (*) to specify files, but it cannot be used for folders.

Here’s to much more efficient development of data movement pipelines in Azure Data Factory in V2.

Azure, Azure Data Factory, Microsoft Technologies, Power BI

How Many Data Gateways Does My Azure BI Architecture Need?

Computer with lock protecting data

It’s not always obvious when you need a data gateway in Azure, and not all gateways are labeled as such. So I thought I would walk through various applications that act as a data gateway and discuss when, where, and how many are needed.

Note: I’m ignoring VPN gateways and application gateways for the rest of this post. You could set up a VPN gateway to get rid of some of the data gateways, but I’m assuming your networking/VPN situation is fixed at this point and working from there. This is a common case in my experience as many organizations are not ready to jump into Express Route when first planning their BI architecture.

Let’s start with what services may require you to use a data gateway.

You will need a data gateway when you are using Power BI, Azure Analysis Services, PowerApps, Microsoft Flow, Azure Logic Apps, Azure Data Factory, or Azure ML with a data source/destination that is in a private network that isn’t connected to your Azure subscription with a VPN gateway. Note that a private network includes on-premises data sources and Azure Virtual Machines as well as Azure SQL Databases and Azure SQL Data Warehouses that require use of VNet service endpoints rather than public endpoints.  

Luckily, many of these services can use the same data gateway. Power BI, Azure Analysis Services, PowerApps, Microsoft Flow, and Logic Apps all use the On Premises Data Gateway. Azure Data Factory (V1 and V2) and Azure Machine Learning Studio use the Data Factory Self-Hosted Integration Runtime.

On Premises Data Gateway (Power BI et al.)

If you are using one or more of the following:

  • Power BI
  • Azure Analysis Services
  • PowerApps
  • Microsoft Flow
  • Logic Apps

and you have a data source in a private network, you need at least one gateway. But there are a few considerations that might cause you to set up more gateways.

  1. Your services must be in the same region to use the same gateway. This means that your Power BI/Office 365 region and Azure region for your Azure Analysis Services resource must match for them to all use one gateway.  If you have resources in different regions, you will need one gateway per region.
  2. You may want high availability for your gateway. You can create high availability clusters so when one gateway is down, traffic is rerouted to another available gateway in the cluster.
  3. You may want to segment traffic to ensure the necessary resources for certain ad hoc live/direct queries or scheduled refreshes. If your usage and refresh patterns warrant it, you may want to set up one gateway for scheduled refreshes and one gateway for live/direct queries back to any on-premises data sources. Or you might make sure live/direct queries for two different high-traffic models go through different gateways so as not to block each other. This isn’t always warranted, but it can be a good strategy.

Data Factory Self-hosted Integration Runtime

If you are using Azure Data Factory (V1 or V2) or Azure ML with a data source in a private network, you will need at least one gateway. But that gateway is called a Self-hosted Integration Runtime (IR).

Self-hosted IRs can be shared across data factories in the same Azure Active Directory tenant. They can be associated with up to four machines to scale out or provide higher availability. So while you may only need one node, you might want a second so that your IR is not the single point of failure.

Or you may want multiple IRs to boost throughput of copy activities. For instance, copying from an on-premises file server with one IR node is about 195 Megabytes per second (MB/s). But with 4 IR nodes, it can be as fast as 505 MB/s.

Factors that Affect the Number of Data Gateways Needed

The main factors determining the number of gateways you need are:

  1. Number of data sources in private networks (including Azure VNets)
  2. Location of services in Azure and O365 (number of regions and tenants)
  3. Desire for high availability
  4. Desire for increased throughput or segmented traffic

If you are importing your data to Azure and using an Azure SQL DB with no VNet as the source for your Power BI model, you won’t need an On Premises Data Gateway. If you used Data Factory to copy your data from an on-premises SQL Server to Azure Data Lake and then Azure SQL DB, you need a Self-Hosted Integration Runtime.

If all your source data is already in Azure, and your source for Power BI or Azure Analysis Services is Azure SQL DW on a VNet, you will need at least one On-Premises Data Gateway.

If you import a lot of data to Azure every day using Data Factory, and you land that data to Azure SQL DW on a VNet, then use Azure Analysis Services as the data source for Power BI reports, you might want a self-hosted integration runtime with a few nodes and a couple of on-premises gateways clustered for high availability.

Have a Plan For Your Gateways

The gateways/integration runtimes are not hard to install. They are just often not considered, and projects get stalled waiting until a machine is provisioned to install them on. And many people forget to plan for high availability in their gateways. Make sure you have the right number of gateways and IR nodes to get your desired features and connectivity. You can add gateways/nodes later, but you don’t want to get caught with no high availability when it really matters.

Azure, Microsoft Technologies

The Necessary Extras That Aren’t Shown in Your Azure BI Architecture Diagram

When we talk about Azure architectures for data warehousing or analytics, we usually show a diagram that looks like the below.

Modern DW Architecture
Modern DW Architecture https://azure.microsoft.com/en-us/solutions/architecture/modern-data-warehouse/

This diagram is a great start to explain what services will be used in Azure to build out a solution or platform. But many times, we add the specific resource names and stop there. If you have built several projects in Azure, you will know there some other things for which you will need to plan. So what’s missing?

Azure Active Directory

Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based identity and access management service. Members of an organization have a user account that can sign in to various services. AAD is used to access Office 365, Power BI, and Dynamics 365, as well as the Azure portal. It can also be used to grant access and permissions to specific Azure resources.

For instance, users who will participate in Data Factory pipeline development must be assigned to the Data Factory Contributor role in Azure. Users can authenticate via AAD to log in to Azure SQL DW. You can use AD Groups to grant permissions to users interacting with various Azure resources such as Azure SQL DW or SQL DB as well as Azure Analysis Services and Power BI. It’s more efficient to manage permissions for groups than to do it for each user. You’ll need a plan to manage group membership.

You may also need to register applications in Azure Active Directory to allow Azure services to authenticate and interact with each other. While the guidance for many services is now to use managed identities, this may not be available for every situation just yet.  

If your organization has some infrastructure on premises, it is likely that they have Active Directory on premises as well. So you will want to make sure you have a solution in place to sync your on-premises and Azure Active Directory.

Networking

Virtual Networks (or VNets) allow many types of Azure resources to securely communicate with each other, the internet, and on-premises networks. You can have multiple virtual networks in an Azure subscription. Each virtual network is isolated from other VNets, unless you set up VNet peering.

Some PaaS services such as Azure Storage Accounts, Azure SQL DW, and Azure Analysis Services support Virtual Network Service Endpoints. A common usage scenario is to set up VNets and VNet Service Endpoints to connect resources to on-premises networks. Some organizations prefer to use VNet Service Endpoints instead of public service endpoints, making it so that traffic can only access the resource from within the organization’s local network.

In order to connect a VNet to an on-premises network or another VNet (outside of peering), you’ll need a VPN Gateway. You’ll need to identify the most appropriate type of VPN Gateway: Point-to-Site, Site-to-Site, or Express Route. These offerings differ based on bandwidth, protocols supported, routing, connection resiliency, and SLAs. Pricing can vary greatly based upon your gateway type.

While VNets and VPN Gateways are probably the most common networking resources in Azure, there are many other networking services and related design decisions to consider as you plan an Azure deployment.

Data Gateways

Your BI solution may be entirely in Azure, but if you need to retrieve data from data sources in a private network (on premises or on a VNet), you’ll need a gateway. If you are using Azure Data Factory, you’ll need a Self-hosted Integration Runtime (IR). If the source for your Power BI or Azure Analysis Services model is on a private network, you’ll need an On-Premises Data Gateway. You can use the same gateway to connect to Analysis Services, Power BI, Logic Apps, Power Apps, and Flow. If you will have a mix of Analysis Services and Power BI models sharing the same gateway, you’ll need to make sure that your Power BI region and your AAS region match.

These gateways require that you have a machine inside the private network on which to install them. And if you want to scale out or have failover capabilities, you may need multiple gateways and multiple VMs. So while you may be building a solution in Azure, you might end up with a few on-premises VMs to be able to securely move source data.

Dev and Test Environments

Our nice and tidy diagram above is only showing production. We also need at least a development environment and maybe one or more test environments. You’ll need to decide how to design your dev/test environments. You may want duplicate resources in a separate resource group; e.g. a dev resource group that contains a dev Data Factory, a dev Azure SQL DW, a dev Azure Analysis Services, etc. While separating environments by resource group is common, it’s not your only option. You will need to decide if you prefer to separate environments by resource group, subscription, directory, or some combination of the three.

ARM templates and PowerShell are very useful for deploying updates and creating new environments. Also, take a look at Azure Blueprints.

You’ll also want to investigate ways to keep the costs of non-prod environments down via smaller-sized resources or pausing or deleting resources where applicable.

Plan For, Don’t Worry About, the Extras

There are several other ancillary Azure services that could/should be part of your solution.

  • For source control, GitHub and Azure Repos have the easiest integration, especially with Azure Data Factory. You’ll not only want source control for things like database projects and Data Factory pipelines, but also possibly for ARM templates and PowerShell scripts used to deploy resources (think: infrastructure as code).
  • You’ll want to set up Azure Monitoring, including alerts to let you know when processes and services are not running as intended.
  • If you want more cost management support than just setting a spending limit on a subscription (if it is available for your subscription type), it may be helpful to set budgets in Azure so you can be notified when you hit certain percentages or amounts.

Be sure to think through the entire data/solution management lifecycle. You may want to include extra services for cataloging, governing, and archiving your data.

This may sound like a complex list, but these resources and services are fairly easy to work with. Azure Active Directory has a user-friendly GUI in the portal, and commands can be automated with PowerShell, requiring relatively little effort to manage users and groups to support your BI environment. VNets and VPN Gateways are a little more complex, but there are step-by-step instructions available for many tasks in the Microsoft Docs. The Power BI Gateway and ADF IR have quick and easy GUI installers that just take a few minutes. You can automate Azure deployments with Azure Pipelines or PowerShell scripts.

None of these things are really that awful to implement or manage in Azure, unless you weren’t aware of them and your project is paused until you can get them set up.

Is there anything else you find is commonly left out when planning for BI solutions in Azure? Leave me a comment if you would like to add to the list.

Update (1/26/19): Helpful readers have commented on other aspects of your Azure BI architecture they felt were often overlooked:

  • Make sure you have a plan for how to process your Azure Analysis Services model. Will you use Azure Automation? Call an Azure Function from Data Factory?
  • Be sure to organize and tag your resources appropriately to help you understand and control costs.
  • Don’t forget Azure Key Vault. This will help you keep keys and passwords secure.

(Thanks to Chad Toney, Shannon Holck, and Santiago Cepas for these suggestions.)

Azure, Azure Data Factory, Microsoft Technologies

Data Factory V2 Activity Dependencies are a Logical AND

Azure Data Factory V2 allows developers to branch and chain activities together in a pipeline. We define dependencies between activities as well as their their dependency conditions. Dependency conditions can be succeeded, failed, skipped, or completed.

This sounds similar to SSIS precedence constraints, but there are a couple of big differences.

  1. SSIS allows us to define expressions to be evaluated to determine if the next task should be executed.
  2. SSIS allows us to choose whether we handle multiple constraints as a logical AND or a logical OR. In other words, do we need all constraints to be true or just one.

ADF V2 activity dependencies are always a logical AND. While we can design control flows in ADF similar to how we might design control flows in SSIS, this is one of several differences. Let’s look at an example.

PipelineNoFail
Data Factory V2 Pipeline with no failure dependencies

The pipeline above is a fairly common pattern. In addition to the normal ADF monitoring that is available with the product, we may log additional information to a database or file. That is what is happening in the first activity, logging the start of the pipeline execution to a database table via a stored procedure.

The second activity is a Lookup that gets a list of tables that should be loaded from a source system to a data lake. The next activity is a ForEach, executing the specified child activities for each value passed along from the list returned by the lookup. In this case the child activity includes copying data from a source to a file in the data lake.

Finally, we log the end of the pipeline execution to the database table.

Activities on Failure

This is all great as long as everything works. What if we want something else to happen in the event that one of the middle two activities fail?

This is where activity dependencies come in. Let’s say I have a stored procedure that I want to run when the Lookup or ForEach activity fails. Your first instinct might be to do the below.

PipelineLogicalAnd
Data Factory V2 Pipeline with two dependencies on failure activity

The above control flow probably won’t serve you very well. The LogFailure activity will not execute unless both the Lookup activity and the ForEach activity fails. There is no way to change the dependency condition so that LogFailure executes if the Lookup OR the ForEach fails.

Instead, you have a few options:

1). Use multiple failure activities. 

PipelineWithFail
Pipeline with stored procedure executed when the Lookup or ForEach activity fails

This is probably the most straight forward but least elegant option. In this option you add one activity for each potential point of failure. The stored procedure you execute in the LogLookupFailure and LogForEachFailure activities may be the same, but you need the activities to be separate so there is only one dependency for execution.

2) Create a parent pipeline and use an execute pipeline activity. Then add a single failure dependency from a stored procedure to the execute pipeline activity. This works best if you don’t really care in which activity your original/child pipeline failed and just want to log that it failed.

ExPipelineWithFail
Execute pipeline activity with a stored procedure executed on failure

3) Use an If Condition activity and write an expression that would tell you that your previous activity failed. In my specific case I might set some activity dependencies to completed instead of success and replace the LogPipelineEnd stored procedure activity with the If Condition activity. If we choose a condition that indicates failure, our If True activity would execute the failure stored procedure and our If False activity would execute the success stored procedure.

PipelineWithIf

Think of it as a dependency, not a precedence constraint.

It’s probably better to think of activity dependencies as being different than precedence constraints. This becomes even more obvious if we look at the JSON that we would write to define this rather than using the GUI. MyActivity2 depends on MyActivity1 succeeding. If we add another dependency in MyActivity2, it would depend both on that new one and the original dependency. Each additional dependency is added on.

{
    "name": "MyPipeline",
    "properties":
    {
        "description": "pipeline description",
        "activities": [
         {
            "name": "MyActivity1",
            "type": "Copy",
            "typeProperties": {
            },
            "linkedServiceName": {
            }
        },
        {
            "name": "MyActivity2",
            "type": "Copy",
            "typeProperties": {
            },
            "linkedServiceName": {
            },
            "dependsOn": [
            {
                "activity": "MyActivity1",
                "dependencyConditions": [
                    "Succeeded"
                ]
            }
          ]
        }
      ],
      "parameters": {
       }
    }
}

Do you have another way of handling this in Data Factory V2? Let me know in the comments.

If you would like to see Data Factory V2 change to let you choose how to handle multiple dependencies, you can vote for this idea on the Azure feedback site or log your own idea to suggest a different enhancement to better handle this in ADF V2.

Azure, Microsoft Technologies, Power BI

Thoughts and Lessons Learned From A Power BI Embedded POC

I worked on a Power BI embedded POC where a report with an in-memory Power BI model as the dataset was embedded into an application in an “app owns data” scenario. This means that the application handles all authentication and access, and users do not need to be Active Directory users or have Power BI licenses. This can be a good fit when you want analysts to be able to change the reports as needed and immediately see the changes in the application

High-Level Components and Steps


Overview of Power BI Embedded in an ISV Scenario
Image from Microsoft Docs: https://docs.microsoft.com/en-us/power-bi/developer/embedding

The following items are needed for embedding Power BI content into an ISV/app owns data application:

  • Azure Active Directory tenant
  • Power BI Pro account
  • Power BI dashboard, tile, or report
  • Power BI workspace
  • Power BI embedded capacity (for testing/production)
  • An application in which to embed the Power BI content

While there is pretty good documentation for this, the steps weren’t immediately clear to me because the app owns data and user owns data scenarios are mixed and matched in some parts of the documentation from Microsoft. I found there are 8 main steps to embedding content with row-level security enabled in an app owns data scenario.

  1. Create the Azure Active Directory account to be used by the embedding application. Assign a Power BI Pro license to the account.
  2. Create an app workspace in PowerBI.com. Set the workspace to private. Set the analyst who owns the report as the workspace admin. Set the service account (created in step 1) as a workspace admin.
  3. Update the Power BI report with row-level security roles and filters. Ensure that usernames and corresponding roles are available to the application.
  4. Publish the Power BI report to the app workspace.
  5. Register the application that will show the report in Azure Active Directory.
  6. Add code to the application to get the Active Directory access token.
  7. Add JavaScript to the application to create the Power BI client, get the content item to embed, create the embed token, and load the content.
  8. Provision the appropriate Power BI embedded capacity in Azure and assign the app workspace containing the report to the embedded capacity.

There is an example project in Github for your reference, as well as a utility to help you generate your embedding code.

Thoughts And Lessons Learned

Interestingly, row-level security works just the same as it does on PowerBI.com. You do nothing different in your PBIX file. You just don’t populate the role members in PowerBI.com. Instead, your pass the effective user in your embed token.

Unlike using the Publish To Web feature, Full Screen mode is not available in an embedded report. You can, however, add a button on the page where you embedded the report that allows it to go full screen.

If users are just consuming a report, and you are using slicers to allow them to filter data rather than the filters pane, it’s nice to hide the filter pane. And it just takes a quick bit of JavaScript. But if you hide the filters pane and have charts where users might use the include/exclude functionality on specific data point, you will need to provide a way to reset the filters since the user can’t access the filters pane. This could be a bookmark on the report page or a button on the application page that uses the APIs to reset the filters.

As of March, you can hide visual headers on all visuals in a report in Reading View. This looks much cleaner and alleviates the issues that arise when menus at the top of one visual overlap the bottom of another. But this also means that users won’t be able to access menu options such as In-Focus Mode and Export Data. If these are important, you will want to leave your visual headers visible. If you have some pages where you would like users to export data and others where it isn’t important, consider splitting out the report so you can turn the visual headers on for one report and off for the other.

After making changes and testing your report, make sure to clear any slicer values before publishing, if you have row-level security on a field shown in a slicer and you leave values selected. The selected values will be shown to users when they view the report. For example, let’s say you have created a row-level security role that can only see Product A, but you can see everything, and you left Product A and Product B selected and deployed the report. A user who views the report next and is a member of that RLS role will see the two selected values in the slicer, even though they can’t see the data for Product B on the page. This may not be a big deal for an internal report. But now imagine this is for clients. You don’t want clients to see other clients in the list. This behavior is consistent in the Power BI web service and isn’t specific to embedding. It’s just important to remember this.

By default, a report will load the page that was shown when the user last saved it. This happens in PowerBI.com as well. In embedded solutions, the page of a report can be specified in the embedding code, essentially specifying a default page within the report when viewed through the application. If a user hits the refresh button on their browser while looking at the report, the report will be loaded to the default page rather than the page the user was last viewing.

My POC proved out that Power BI provided the functionality to add great visuals to an application page that a non-developer analyst could manage. It also helped us understand our formatting options. You can get started with Power BI embedded without having to provision the embedded node in Azure, so it’s a no/low dollar commitment to give it a try.

If you have done a Power BI embedded project, please comment and let me know what you liked and didn’t like, or if there are any ideas to which I should add a vote.

Azure, Conferences, Microsoft Technologies, Personal

Please join me for my PASS Summit Pre-Con with Melissa Coates

I’m excited to announce that I’m joining forces with Melissa Coates (aka SQL Chick) to do a full-day PASS Summit Pre-Conference Session this year!

We’ll be talking about Designing Modern Data and Analytics Solutions in Azure.

Many traditional data warehousing professionals as well as other data engineers are taking on analytics projects in Azure. There are more (and ever-changing) options available in Azure that extend our capabilities beyond what we had on premises. And there are several different ways to create an analytics solution in Azure, to the point that it can be difficult or overwhelming to have to make those technology decisions up front.  We want to help you get started in Azure, provide design patterns and reference architectures, and share our lessons learned from solutions we have implemented. We’ll talk through technologies such as Azure SQL DB, Azure SQL DW, Azure Data Lake, Azure Data Factory, Azure Databricks, HDInsight, Analysis Services, Azure Machine Learning, Power BI, Virtual Machines, and more.

Approximately 30% of the day will be hands-on labs, 50% presentation, and 20% open discussion and questions.

Attendees of our session will gain a broad understanding of the fundamentals for designing data solutions in Azure, techniques for navigating the wide variety of platform choices in Azure, and suggestions for developing sound architectural systems.

I hope you’ll join us on Monday, November 5th.