I need help with finding difference between 2 strings. For example, difference between the strings outlook and outlooka needs to be "a" or even the number of characters that differ should work fine.
I am okay with converting the strings to array and calculating the set difference as well.
Any help is much appreciated. Thank you.
I am trying to identify homoglyph domains with minor changes.
CodePudding user response:
This query counts each character occurrences in each string and returns the differences.
datatable(id:int, str1:string, str2:string)
[
1 ,"outlook" ,"outlooka"
,2 ,"outlook" ,"outlok"
,3 ,"outlook" ,"outllooook"
,4 ,"outlook" ,"lookout"
]
| mv-apply c = extract_all("(.)", strcat(str1, str2)) to typeof(string)
,s = array_concat(repeat("1", strlen(str1)), repeat("2", strlen(str2))) to typeof(string) on
(
summarize count_diff = countif(s == 2) - countif(s == 1) by c
| summarize char_diff = make_bag_if(bag_pack(c, count_diff), count_diff != 0)
)
id | str1 | str2 | char_diff |
---|---|---|---|
1 | outlook | outlooka | {"a":1} |
2 | outlook | outlok | {"o":-1} |
3 | outlook | outllooook | {"o":2,"l":1} |
4 | outlook | lookout | {} |