I have a data set in Kusto and I would like to gather the id
s from each day where their latency
is greater than the 99th percentile for that day.
e.g I have a table like this:
id | latency | time |
---|---|---|
a | 23.4 | 1/2/2022 |
c | 1.4 | 1/2/2022 |
b | 25.6 | 1/3/2022 |
d | 2.3 | 1/3/2022 |
and I want the result table to just be
time | id |
---|---|
1/2/2022 | a |
1/3/2022 | b |
where a
and b
come from the 99th percentile of latency times for those days
What I would like to work:
let 99p= T
| summarize percentiles(latency, 99) by bin(time, 1d);
T
| where startofday(time) == 99p.time
| where latency > 99p.latency
| project time, id
Thanks for any insight, pretty new to kusto
CodePudding user response:
// This is not a part of the solution, only generation of a sample data set
let T = materialize (range id from 1 to 1000 step 1 | extend ['time'] = ago(rand()*10d), latency = round(rand()*100,2));
// The solution starts here
T | extend day = bin(['time'], 1d) | as T2
| summarize percentiles(latency, 99) by day
| join kind=inner T2 on day
| where latency >= percentile_latency_99
| summarize make_list(id) by day
day | list_id |
---|---|
2022-03-20T00:00:00Z | [298] |
2022-03-21T00:00:00Z | [642] |
2022-03-22T00:00:00Z | [200,504] |
2022-03-23T00:00:00Z | [726,885] |
2022-03-24T00:00:00Z | [590,975] |
2022-03-25T00:00:00Z | [107] |
2022-03-26T00:00:00Z | [836] |
2022-03-27T00:00:00Z | [68] |
2022-03-28T00:00:00Z | [527,859] |
2022-03-29T00:00:00Z | [172,874] |
2022-03-30T00:00:00Z | [274] |