Home > OS >  Kusto String Difference
Kusto String Difference

Time:12-13

I need help with finding difference between 2 strings. For example, difference between the strings outlook and outlooka needs to be "a" or even the number of characters that differ should work fine.

I am okay with converting the strings to array and calculating the set difference as well.

Any help is much appreciated. Thank you.

I am trying to identify homoglyph domains with minor changes.

CodePudding user response:

This query counts each character occurrences in each string and returns the differences.

datatable(id:int, str1:string, str2:string)
[
    1   ,"outlook"  ,"outlooka"
   ,2   ,"outlook"  ,"outlok"
   ,3   ,"outlook"  ,"outllooook"
   ,4   ,"outlook"  ,"lookout"
] 
| mv-apply c = extract_all("(.)", strcat(str1, str2)) to typeof(string)
          ,s = array_concat(repeat("1", strlen(str1)), repeat("2", strlen(str2))) to typeof(string) on
 (
      summarize count_diff = countif(s == 2) - countif(s == 1) by c
    | summarize char_diff = make_bag_if(bag_pack(c, count_diff), count_diff != 0)
 )
id str1 str2 char_diff
1 outlook outlooka {"a":1}
2 outlook outlok {"o":-1}
3 outlook outllooook {"o":2,"l":1}
4 outlook lookout {}

Fiddle

  • Related