What query language does Sentinel use?

Get full access to Microsoft Sentinel in Action - Second Edition and 60K+ other titles, with free 10-day trial of O'Reilly.

There's also live online events, interactive content, certification prep materials, and more.

Kusto Query Language is the language used across Azure Monitor, Azure Data Explorer and Azure Log Analytics (what Microsoft Sentinel uses under the hood). I have always found this visualization regarding KQL useful -

What query language does Sentinel use?

We want to use KQL to create accurate and efficient queries to find threats, detections, patterns and anomalies from within our larger data set.

The Anatomy of a KQL Query

Take the below query as an example

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"
| project TimeGenerated, Location, IPAddress, UserAgent

When we run a query like this the first line tells Microsoft Sentinel which table to look for data in, so in this case we want to search the SigninLogs table, which is where Azure AD sign in data is sent to. You can see a list of tables here.

Microsoft Sentinel will then run through your query sequentially, so it will run each line one by one until it hits the end, or you have an error. So to breakdown our query line by line.

SigninLogs

So first we have chosen our SigninLogs table.

SigninLogs
| where TimeGenerated > ago(14d)

Next we tell Sentinel to look back at the last 14 days worth of data in this table.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"

Next we ask Sentinel to only find logs where UserPrincipalName is equal to "[email protected]"

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"

Then we look for only logs where the ResultType == 0, which is a successful logon to Azure AD.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"

Next we look for only signins to Microsoft Teams.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"
| project TimeGenerated, Location, IPAddress, UserAgent

Our last line uses the project operator, to return only 4 fields from our logs, so we will only see the TimeGenerated, Location, IPAddress and UserAgent returned from our SigninLogs data.

That is how you build queries, now the basics.

The Basics

Time Basics

Microsoft Sentinel and KQL are highly optimized for time filters, so if you know the time period of data you want to search, you should filter the time range straight away. Retrieving the last 14 days of logs, then searching for a username like the below query -

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"

Is much more efficient than searching first for a username and then searching for the time period like this -

SigninLogs
| where UserPrincipalName == "[email protected]"
| where TimeGenerated > ago(14d)

KQL has many options for querying particular time periods.

SigninLogs
| where TimeGenerated > ago(14d)

As per the first example, this will search for the last 14 days.

SigninLogs
0

You can also do hours.

SigninLogs
1

And minutes.

KQL also supports querying between time ranges -

SigninLogs
2

This will find SigninLogs data between 14 days and 7 days ago.

SigninLogs
3

Between 14 hours and 7 hours ago.

SigninLogs
4

And between 14 minutes and 7 minutes ago.

Where Basics

Where is an operator you will use in basically every query you write. This is how you tell Microsoft Sentinel to hunt for specific data. Syntax is very important with the where operator. If we use our same example.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"

This will search our SigninLogs table, over the last 14 days, for exact matches where our UserPrincipalName equals [email protected]. In KQL == is case sensitive, so if you search for [email protected] and the username is actually [email protected], you won't get any results. The non case sensitive equivalent is =~

SigninLogs
6

This will find any matches for [email protected] regardless of case sensitivity.

Instead of equals, we can also use contains.

SigninLogs
7

This will find any log entries where the UserPrincipalName contains reprise_99, if you had [email protected] and [email protected] data, it would find both. The contains operator is not case sensitive, but you can use contains_cs to make it case sensitive.

You can use either startswith or endswith if you are searching for particular patterns.

SigninLogs
8

Both startswith and endswith are not case sensitive, but you can use the startswith_cs or endswith_cs to make them case sensitive.

If you are searching for full words (greater than four characters), in KQL you can use the has operator. Using 'has' is more efficient than 'contains' as the data is indexed for you.

SigninLogs
9

This will find any SigninLogs where the application display name has the word Teams in it, that could include "Microsoft Teams" and "Microsoft Teams Web Client", both satisfy the query.

If you are searching for multiple words you can use has_any or has_all.

SigninLogs
| where TimeGenerated > ago(14d)
0

This will return results where the application display name contains either "Teams" or "Outlook"

SigninLogs
| where TimeGenerated > ago(14d)
1

This will return results where the application display name has "Teams" and "Outlook".

If you don't know which fields to search in, you can also use wildcards, it is inefficient but may get you on the right track to find what you want.

SigninLogs
| where TimeGenerated > ago(14d)
2

This will search the SigninLogs table for any field that contains reprise_99.

A number of these options also support using ! to reverse the query and find results where it is not true.

SigninLogs
| where TimeGenerated > ago(14d)
3

This query would find all SigninLogs where the UserPrincipalName does not equal [email protected]

SigninLogs
| where TimeGenerated > ago(14d)
4

This query would find all SigninLogs where the UserPrincipalName does not contain reprise_99

SigninLogs
| where TimeGenerated > ago(14d)
5

This query would find SigninLogs where the application display name does not contain "Teams".

Project Basics

Project allows us to select which columns are returned in our query and in which order.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"
| project TimeGenerated, Location, IPAddress, UserAgent

This query searches for SigninLogs data from the last 14 days, where the UserPrincipalname equals [email protected], where the ResultType is 0, where the application display name equals "Microsoft Teams", then for each match on that query it returns the TimeGenerated, Location, IPAddress and the UserAgent.

We can rename colums as part of the same function.

SigninLogs
| where TimeGenerated > ago(14d)
7

This returns the same data, but renames the columns to LogTime, SigninLocation, IP and Agent.

We can even manipulate the output inline with the project operator.

SigninLogs
| where TimeGenerated > ago(14d)
8

This returns the same data, but changes the TimeGenerated name to LocalTime and converts to a +5h time zone if you work in that time zone.

project-away is the opposite of project and will remove columns from your query.

SigninLogs
| where TimeGenerated > ago(14d)
9

In this query we remove UserAgent. Remember, if you remove a column you then can't access it later in your query.

Summarize Basics

Summarize produces a table that aggregates the content of your query. Summarize has a number of underlying aggregation functions. If we again take our example query, we can manipulate the results in various ways using summarize.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
0

This query will look up the SigninLogs table for any events in the last 14 days, for any matches for [email protected], where the result is a success (ResultType == 0) and then summarize those events by the application display name.

You can optionally name the result column.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
1

This returns the same data but updates the name of the returned column to AppCount.

Instead of a total count, you can summarize a distinct count.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
2

This will return a single record for each distinct application [email protected] signed into.

You can use the arg_max and arg_min functions to return either the newest or oldest record that matches your query.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
3

This query looks for all signin logs over the last 14 days, that have [email protected] as the UserPrincipalname, that are successful and then returns the latest record.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
4

This is the same but returns the oldest record.

You can use countif to provide logic to your summations.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
5

This summarizes the data into two new columns, TeamsLogons where the application display name has "Teams" and SharePointLogons where the application display name has "SharePoint"

You can further manipulate your data by telling KQL to place your data into time 'bins'.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
6

This returns the same data as our first summarize example and then groups that data into 1d bins.

You can combine these functions together where useful

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
7

This is a combination of our countif and bin functions, where we summarize based on our application display name and also place the results into 1d bins.

You can make a set of items within a query.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
8

This will output a list of applications that [email protected] has signed into to a list called AppList.

You can combine this with our time bin.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
9

This will make a list of applications that [email protected] has signed into separated into one list per day.

Render Basics

The render operator allows KQL to visualize data into different formats such as piecharts, time or areacharts and column and bar charts.

If we use our same example from our Signinlogs table, we can see how we can visualize the data in various ways.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
0

This query summarizes all the applications [email protected] has signed into in the last 14 days, then renders the output as a piechart.

What query language does Sentinel use?

You can also render as a column chart.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
1

What query language does Sentinel use?

Or a barchart.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
2

What query language does Sentinel use?

For time data, you first summarize your data into time 'bins' as outlined in the summarize section, and you can then visualize your data over a time period.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
3

This visualizes all signins by [email protected] per day over the last 14 days and displays it as a timechart.

What query language does Sentinel use?

You can also use render as an areachart.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
4

What query language does Sentinel use?

Column charts and bar charts can also be used with time data. You will get a column or bar per time 'bin' over your larger time period.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
5

This is the same query as our timechart, but rendered as a column chart with a column per day.

What query language does Sentinel use?

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
6

And a bar chart.

What query language does Sentinel use?

With column or bar charts you can have them stacked together (which is the default).

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
7

This query finds all the signins for our account, counts the signins per application, then creates a single column for each day.

What query language does Sentinel use?

If you want each application to have its own column you can set it to be unstacked.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
8

What query language does Sentinel use?

You can also rename the axis and title of your chart in line with KQL.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
9

What query language does Sentinel use?

You can combine logic in your summarize actions to build dynamic content for your render operator.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"
0

This query searches all signins to your tenant, then counts three groups - one where the application display name has "Teams", one where the application display name has "OneDrive" and one where the application display name has "SharePoint" for each day over the last 14 days, then renders as an unstacked column chart.

What query language does Sentinel use?

Parse and Split Basics

Parse and split are two different ways to extend a string of data to multiple columns based on matches. A lot of logs ingested to Microsoft Sentinel may come in as a single long string (such as sysmon), parse and split allow you to manipulate them into readable data.

For these examples, we will use the following test data

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"
1

This data is just a single string that looks like this.

What query language does Sentinel use?

We can parse out particular data matches with the following

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"
2

This will parse all the data between Name= and , to a new column called 'DisplayName'.

What query language does Sentinel use?

You can parse out multiple columns within the same command by matching along the string

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"
3

This parses three new columns - DisplayName, DomainSuffix and AzureADTenantId

What query language does Sentinel use?

Remembering that KQL runs its operations sequentially, once we parse we can then parse again on the newly created column.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"
4

This further parses our domain to find the top level domain, in this case a .com

What query language does Sentinel use?

When using the parse operator, KQL will run through all your rows of data and return even results where there is no match. So depending on your data structure you could end up with many rows of empty data. If we expand our example data to include another row of data, with different names and run the same query you will see empty results.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"
5

What query language does Sentinel use?

To combat this you can use the 'parse-where' operator, which only returns results where there are matches to your query.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"
6

We can see we are back to a single result where we had a match on our parse.

What query language does Sentinel use?

Split separates a string of text into an array based on a delimiter. If we go back to our original test data, we can split based on the comma sign.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"
7

We will be returned an array with our string split out.

What query language does Sentinel use?

Split is index aware so if your data is in the same order, you can split directly into new columns.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"
8

If we know our data location within the string then we can split it directly into named columns.

What query language does Sentinel use?

Once we have split our data, we can query it as though it was structured from the outset. So if we add a second record to our data, then query on specifc matches we will find what we are after.

SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName == "[email protected]"
| where ResultType == "0"
| where AppDisplayName == "Microsoft Teams"
9

We get only one hit, where Name contains "Reprise99", our second record where Name contains "Reprise103" isn't found.

What query language does Microsoft Sentinel use?

KQL is the query language used to perform analysis on data to create analytics, workbooks, and perform hunting in Microsoft Sentinel. Learn how basic KQL statement structure provides the foundation to build more complex statements.

What database does Azure Sentinel use?

Azure Sentinel uses Azure Monitor which is built on a proven and scalable log analytics database that ingests more than 10 petabytes every day and provides a very fast query engine that can sort through millions of records in seconds.

How does Azure Sentinel collect data?

Microsoft Sentinel collects data into the Log Analytics workspace from multiple sources. Data from built-in data connectors is processed in Log Analytics using some combination of hardcoded workflows and ingestion-time transformations in the workspace DCR.

Is Sentinel a SIEM or soar?

Microsoft Sentinel is a scalable, cloud-native solution that provides: Security information and event management (SIEM) Security orchestration, automation, and response (SOAR)