Home > OS >  Azure AKS data ingestion optimization in log analytics
Azure AKS data ingestion optimization in log analytics

Time:03-21

I am using azure AKS for Mechine learning model deployment and it automatiaclly deploys models weekly

Now AKS produces more costs for log analytics data ingestion

We are working to optimize the data ingestion to the log analytics

i have two nodes in AKS

Somehow we can reduce some data ingestion. but when i see the data ingestion today for past 24 hour it again increases and when i try to see the nodes which produces billable data ingestion it shows one more filed which shows as 'deprecate field: see http://aka'

below i mentioned query and the query result for reference

query

find where TimeGenerated > ago(24h) project _BilledSize, Computer
| extend computerName = tolower(tostring(split(Computer, '.')[0]))
| where computerName != ""
| summarize TotalVolumeBytes=sum(_BilledSize) by computerName

query result

ComputerName                        TotalVolumeBytes

aks-agentpool-28198374-vmss000007   232,567,315 

aks-agentpool-28198374-vmss000001   617,340,843 

deprecated field: see http://aka    129,052 

computerName
deprecated field: see http://aka

TotalVolumeBytes
129052

Here aks-agentpool-28198374-vmss000007 and aks-agentpool-28198374-vmss000001 are my nodes

But i dont have any idea about 'deprecated field: see http://aka'

I am not handling ML model deployment(i dont know obviuosly) and i asked ML team and they also dont know how this came into this

I analyzed many documents and query but i could not get the answer about what it is and how to get rid of this.

Can anyone guide me on this what is this in my node list and how can i stop this?

CodePudding user response:

The value you are getting http://aka is probably part of a link to some microsoft documentation. You are truncating it though when you do tolower(tostring(split(Computer, '.')[0])).

Try add Computer to your summarize clause so that you can get the full link:

find where TimeGenerated > ago(24h) project _BilledSize, Computer
| extend computerName = tolower(tostring(split(Computer, '.')[0]))
| where computerName != ""
| summarize TotalVolumeBytes=sum(_BilledSize) by computerName, Computer

Once you do this you discover that the value of the field Computer is Deprecated field: see http://aka.ms/LA-Usage

Visiting that link tells you that the Usage table contains "Hourly usage data for each table in the workspace." And in the column reference table you can see that there are a number of columns that are deprecated, for example "Computer".

I.e The computer field is being deprecated and will later be removed from the usage table. It is still there, but the value of the field will always be Deprecated field: see http://aka.ms/LA-Usage , until it is removed indefinitely.

This just means that the Computer field in the Usage table should no longer be used and will be removed. The Usage table alone can no longer be used to determine which Computer incurred which cost.

You can query the usage table with the following query and see that the Computer field always conatains the deprecation message.

Usage
| summarize sum(_BilledSize) by Computer, SourceSystem, DataType, Type

EDIT:

If you are concerned with finding which resource contributes more to the cost of your workspace rather than which resource generates the most logs (which do not necessarily need to be the same since some data is ingested for free in log analytics), please add _IsBillable as a filter to your query to exclude entries that are not billed:

find where TimeGenerated > ago(24h) project _BilledSize, Computer, _IsBillable
| extend computerName = tolower(tostring(split(Computer, '.')[0]))
| where computerName != "" and _IsBillable == true
| summarize TotalVolumeBytes=sum(_BilledSize) by computerName

Since the entries in the usage table where the field Computer is a deprecated field is not billable it will not show up.

  • Related