Azure, Azure Data Factory, KQL, Microsoft Technologies

Retrieving Log Analytics Data with Data Factory

I’ve been working on a project where I use Azure Data Factory to retrieve data from the Azure Log Analytics API. The query language used by Log Analytics is Kusto Query Language (KQL). If you know T-SQL, a lot of the concepts translate to KQL. Here’s an example T-SQL query and what it might look like in KQL.

--T-SQL: 
SELECT * FROM dbo.AzureDiagnostics 
WHERE TimeGenerated BETWEEN '2020-12-15 AND '2020-12-16'
AND database_name_s = 'mydatabasename'
//KQL: 
AzureDiagnostics 
| where TimeGenerated between(datetime('2020-12-15') .. datetime('2020-12-16')) 
| where database_name_s == 'mydatabasename'

For this project, we have several Azure SQL Databases configured to send logs and metrics to a Log Analytics workspace. You can execute KQL queries against the workspace in the Log Analytics user interface in the Azure Portal, a notebook in Azure Data Studio, or directly through the API. The resulting format of the data downloaded from the API leaves something to be desired (it’s like someone shoved a CSV inside a JSON document), but it’s usable after a bit of parsing based upon column position. Just be sure your KQL query actually states the columns and their order (this can be done using the Project operator).

You can use an Azure Data Factory copy activity to retrieve the results of a KQL query and land them in an Azure Storage account. You must first execute a web activity to get a bearer token, which gives you the authorization to execute the query.

Data Factory pipeline containing a web activity to get a bearer token and a copy activity to copy data from the Log Analytics API.
Data Factory pipeline that retrieves data from the Log Analytics API.

I had to create an app registration in Azure Active Directory for the web activity to get the bearer token. The web activity should perform a POST to the following url (with your domain populated and without the quotes): "https://login.microsoftonline.com/[your domain]/oauth2/token"

Make sure you have added the appropriate header of Content-Type: application/x-www-form-urlencoded. The body should contain your service principal information and identify the resource as "resource=https://api.loganalytics.io". For more information about this step, see the API documentation.

Data Factory Copy Activity

The source of the copy activity uses the REST connector. The base url is set to "https://api.loganalytics.io/v1/workspaces/[workspace ID]/" (with your workspace ID populated and without the quotes). Authentication is set to Anonymous. Below is my source dataset for the copy activity. Notice that the relative url is set to “query”.

Connection properties of a dataset in Azure Data Factory. The base url points to https://api.loganalytics.io/v1/workspaces/[workspaceid] with the workspace ID not shown. The relative url contains the string "query".
ADF Dataset referencing a REST linked service pointing to the Log Analytics API

The Source properties of the copy activity should reference this REST dataset. The request method should be POST, and the KQL query should be placed in the request body (more on this below).

Two additional headers need to be added in the Source properties.

Additional Headers section of a Data Factory copy activity. Two headers are shown. 1) content-type: application/json; charset=utf-8 2) Authorization: @concat('Bearer ', activity('Get Bearer Token').output.access_token)
Additional headers in the Source properties of the ADF copy activity

The Authorization header should pass a string formatted as “Bearer [Auth Token]” (with a space between the string “Bearer” and the token). The example above retrieves the token from the web activity that executes before the copy activity in the pipeline. Make sure you are securing your inputs and outputs so your secrets and tokens are not being logged in Data Factory. This option is currently found on the General properties of each activity.

Embedding a KQL Query in the Copy Activity

You must pass the KQL query to the API as a JSON string. But this string is already inside the JSON created by Data Factory. Data Factory is a bit picky in how you enter the query. Here is an example of how to populate the request body in the copy activity.

{
"query": "AzureDiagnostics | where TimeGenerated between(datetime('2020-12-15') .. datetime('2020-12-16')) | where database_name_s == 'mydatabasename'" 
}

Note that the curly braces are on separate lines, but the query must be on one line. So where I had my query spread across 3 lines in the Log Analytics user interface as shown at the beginning of this post, I have to delete the line breaks for the query to work in Data Factory.

The other thing to note is that I am using single quotes to contain string literals. KQL supports either single or double quotes to encode string literals. But using double quotes in your KQL and then putting that inside the double quotes in the request body in ADF leads to errors and frustration (ask me how I know). So make it easy on yourself and use single quotes for any string literals in your KQL query.

In my project, we were looping through multiple databases for customized time frames, so my request body is dynamically populated. Below is a request body similar to what I use for my copy activity that retrieves Azure Metrics such as CPU percent and data storage percent. The values come from a lookup activity. In this case, the SQL stored procedure that is executed by the lookup puts the single quotes around the database name so it is returned as ‘mydatabasename’.

{
"query": "AzureMetrics | where TimeGenerated between (datetime(@{item().TimeStart}) .. datetime(@{item().TimeEnd})) | where Resource == @{item().DatabaseName} | project SourceSystem , TimeGenerated , Resource, ResourceGroup , ResourceProvider , SubscriptionId , MetricName , Total , Count , Maximum , Minimum , TimeGrain , UnitName , Type, ResourceId"
}

With dynamically populated queries like the above, string interpolation is your friend. Paul Andrew’s post on variable string interpolation in a REST API body helped me understand this and get my API request to produce the required results.

You can do similar things with Data Factory to query the Application Insights API. In fact, this blog post on the subject helped me figure out how to get the Log Analytics data I needed.

Be Aware of API Limits

There are limits to the frequency and amount of data you can pull from the Log Analytics API. As noted in the API documentation:

  • Queries cannot return more than 500,000 rows
  • Queries cannot return more than 64,000,000 bytes (~61 MiB total data)
  • Queries cannot run longer than 10 minutes (3 minutes by default)

If there is a risk that you may hit the limit on rows or bytes, you need to be aware that the Log Analytics API does not return an error in this case. It will return the results up to the limit and then note the “partial query failure” in the result set. As far as I can tell, there is no option for pagination, so you will need to adjust your query to keep it under the limits. My current process uses a Get Metadata activity after the copy activity to check file sizes for anything close to the limit and then breaks that query into smaller chunks and re-executes it.

It’s All in the Details

I had a lot of trial and error as I worked my way through populating the request body in the API call and dealing with API limits. I hope this helps you avoid some of the pitfalls.

Microsoft Technologies, Power BI, Workout Wednesday

Workout Wednesdays for Power BI in 2021

I’m excited to announce that something new is coming to the Power BI community in 2021: Workout Wednesday!

Workout Wednesday logo

Workout Wednesday started in the Tableau community and is expanding to Power BI in the coming year. Workout Wednesdays present challenges to recreate a data-driven visualization as closely as possible. They are designed to help you improve your skills in Power BI and Tableau.

How You Can Participate

Watch for the Power BI challenge to be published on Wednesdays in 2021. The challenge will contain requirements and a dataset. Use the dataset to create the desired end result.

Then share your workout! You can post your workout to the Data Stories Gallery or your blog, or just share a public link. If you aren’t able to share a public link – perhaps because that option is disabled in your Power BI tenant or you don’t have a Power BI tenant– a gif, a video, or even some screenshots are just fine.

To formally participate: Post to Twitter using both the #WOW2021 and #PowerBI hashtags along with a link/image/video of your workout. Include a link to the challenge on the Workout Wednesday site. And please note the week number in your description, if possible.

Community Growth

I’m looking forward to Workout Wednesdays for a couple of reasons. First, I think Power BI needs more love in the data visualization department. We need to be talking about effective visualization techniques and mature past ugly pie charts and tacky backgrounds. And I think Workout Wednesdays will help us individually grow those skills, but it will also foster more communication and sharing of ideas around data visualization in Power BI. That in turn will lead to more product enhancement ideas and conversations with the Power BI team, resulting in a better product and a stronger community.

Second, I’m also excited to see the crosspollination and cross-platform learning we will achieve by coming together as a data visualization community that isn’t focused on one single tool. There is a lot Tableau practitioners and Power BI practitioners can learn from each other.

Join Me In January

Keep an eye out on Twitter and the Workout Wednesday website for the first challenge coming January 6. While it would be great if you did the workout for every single week, don’t be concerned if you can’t participate every week. A solution will be posted about a week later, but nothing says you can’t go back and do workouts from previous weeks as your schedule allows.

I look forward to seeing all of your lovely Workout Wednesday solutions next year!