I have already tried the solution here but it gives me an empty file, even though I have non-duplicated unique lines.
I have a large text file (2GB) containing very long strings in each line.
AB02819380213. : (( 00 99 - MO:ASKDJIO*U* HIUGHUHAHUHHA AUCCGTCTTCTTTTTTA FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF
a01219f8b
NJSAJDH*)8888- 99 100. - NKJJABHASDGASGYUOISADIJIJA TCTCTCTTTCTACACTAATCACAATACTACA FFFFFFFFFFF
a023129ab
NJSAJDH*)8888- 99 100. - NKJJABHASDGASGYUOISADIJIJA TCTCTCTTTCTACACTAATCACAATACTACA FFFFFFFFFFF
000axa2381a
AB02819380213. : (( 00 99 - MO:ASKDJIO*U* HIUGHUHAHUHHA AUCCGTCTTCTTTTTTA FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF
The expected output here would be
a01219f8b
a023129ab
000axa2381a
How can I do this in bash or Python?
CodePudding user response:
If you are not worried about the ordering of the output:
$ awk '{a[$0] }END{for (i in a) if (a[i] == 1) print i}' file
000axa2381a
a01219f8b
a023129ab
Array a
will hold the count of occurrence of each line. And in the end, print when the count is 1.