I have two tab-delimited files that I want to compare.
I want to look for values of col1 in file1 that are missing in file2 (prepend with "left") and values in col1 in file2 that are missing in file1 (prepend with "joined"). For these lines, I want to print col1 and col8. My diff command fails for lines with an equal col1 and different values for col8.
file1:
Char1 55 Necromancer Knight A 11/21/21 Zone Char1(Main) off off 0 Char1(Main)
Char2 28 Druid Squire A 12/08/21 Zone Char1 off off 0 Char1
Char3 44 Enchanter Recruit A 08/07/21 Zone Char3(Main) off off 0 Char3(Main)
Char4 56 Enchanter Knight A 11/06/21 Zone Char4(Main) off off 0 Char4(Main)
Char5 10 Magician Recruit A 10/29/21 Zone Char1 off off 0 Char1
Char6 65 Druid Champion A 12/12/21 The Lair of the Splitpaw Char6(Main) VT emp time off off 0 Char6(Main) VT emp time
file2:
Char1 55 Necromancer Knight A 11/21/21 Zone Char1(Main) off off 0 Char1(Main)
Char2 28 Druid Squire A 12/08/21 Zone Char1 off off 0 Char1
Char3 44 Enchanter Recruit A 08/07/21 Zone Char3(Main) off off 0 Char3(Main)
Char4 56 Enchanter Knight A 11/06/21 Zone Char4(Main) off off 0 Char4(Main)
Char5a 10 Magician Recruit A 10/29/21 Zone Char1 off off 0 Char1
Char6 65 Druid Champion A 12/21/21 Zone Char6(Main) Emp/VT/Time off off 0 Char6(Main) Emp/VT/Time
diff command producing output:
diff --new-line-format="joined %L" --old-line-format="left %L" --unchanged-line-format="" <(cut -f1,8 "$file1" | sort) <(cut -f1,8 "$file2" | sort) | sort
Current output:
joined Char5a Char1
joined Char6 Char6(Main) Emp/VT/Time
left Char5 Char1
left Char6 Char6(Main) VT emp time
Desired output:
joined Char5a Char1
left Char5 Char1
Any help is greatly appreciated, thank you!
CodePudding user response:
Diff is for simple differences. A solution for your problem is easier with awk
.
You do not need to sort the files, just read both files and count how often the keys in column 1 occur.
While counting the key values prepare the potential output. When NR==FNR
you are in the first file and you want to use "left".
When both files are read, look for keys with only one occurance and print the prepared output for that line.
awk -F'\t' '
{
key[$1]
value[$1]=(NR==FNR ? "left " : "joined ") $1 "\t" $8
}
END {
for (i in key) {
if (key[i]==1) {
print value[i]
}
}
}' file1 file2
CodePudding user response:
I am not really versed with awk.
But I think it would be someting like this:
awk -F' ' 'NR==FNR{x[$1] ;next};x[$1] > 0' file1 file2
Checkout this post. There you'll find some more examples on txt file comparison
CodePudding user response:
Hope this python script gets what you are expecting :
#!/bin/bash
python -c '
import re, sys
def read_into_dict(regex, file):
return { re.findall(regex, line)[0]: line for line in open(file) }
regex, file1, file2 = sys.argv[1:]
dict1 = read_into_dict(regex, file1)
dict2 = read_into_dict(regex, file2)
keys1 = set(dict1.keys())
keys2 = set(dict2.keys())
for key in (keys1 - keys2):
print(f"left {dict1[key]}")
for key in (keys2 - keys1):
print(f"joined {dict2[key]}")
' '^(\S )' <(cut -f1,8 "$file1" | sort) <(cut -f1,8 "$file2" | sort) | sort