Let's say I have a KQL query that uses several tables to retrieve the data. I need to write some code in C#, that will take all the tables used by a given KQL query, and put all those table names into a list.
Simply put: I need to analyze each KQL query to know from which tables it gets the data.
I already tried doing so by writing this code:
var query = "Table1 | project a ,b,c";
var code = KustoCode.Parse(query);
var parseCode = code.Analyze();
Console.WriteLine(parseCode.ResultType.Display.ToString());
But this doesn't return the tables names, but instead it returns the columns names that this query used, which is not what I want.
If you could help me solve this I would greatly appreciate it!
CodePudding user response:
This would seem to do the trick (based on .show queryplan
)
using System.Data;
using Newtonsoft.Json.Linq;
string query = @".show queryplan <| search in (database('*').*) * | summarize count() by $table";
string cluster = "https://help.kusto.windows.net/Samples";
using (var client = Kusto.Data.Net.Client.KustoClientFactory.CreateCslQueryProvider($"{cluster};Fed=true"))
{
using IDataReader reader = client.ExecuteQuery(query);
reader.Read();
reader.Read();
string relop_tree = reader.GetString(2);
JObject content = JObject.Parse(relop_tree);
IEnumerable<JToken> sources = content.SelectTokens("$.source.operands..[?(@.source.type == 'TableShardReference')].source");
foreach (JToken source in sources)
{
Console.WriteLine($"{source.Value<string>("database"), -50}{source.Value<string>("table")}");
}
}
ContosoSales SalesFact
ContosoSales Products
ContosoSales Customers
ContosoSales SalesTable
KustoMonitoringPersistentDatabase KustoMonitoringTable
SampleIoTData RawSensorsData
SampleIoTData TransformedSensorsData
SampleIoTData _MV_LatestSensorValue
SampleLogs RawSysLogs
SampleLogs TransformedSysLogs
SampleLogs TraceLogs
SampleLogs TargetTable
SampleMetrics RawServerMetrics
SampleMetrics TransformedServerMetrics
SampleMetrics _MV_TransformedMetricsDedup
SampleMetrics SQLServerLocation
SampleMetrics _MV_AvgPerfWithServersHierarchy
Samples StormEvents
Samples demo_make_series1
Samples demo_series2
Samples demo_series3
Samples demo_many_series1
Samples ConferenceSessions
Samples demo_make_series2
Samples demo_clustering1
Samples Covid19_Bing
Samples Covid19_map2
Samples US_States
Samples US_Counties
Samples irregular_ts
Samples _MV_DailyCovid19
Samples demo_prometheus
Samples PopulationData
Samples OccupancyDetection
Samples SamplePowerRequirementHistorizedData
CodePudding user response:
I don't believe this is knowable a priori as some table names can be resolved at query time and some queries (for example using find) can cover all/multiple tables that are not enumerated until query time.
You can find this information shortly after the fact if you turn on Table Usage Statistics in the 'Diagnostic settings' tab in the azure portal view of your ADX cluster. This will pump detailed information into an app insights resource that will show what table/extents get used.