i have a log file containing modules and queries in this order:
com.ab
com.ab
com.ac
com.ad
com.ab
com.ac
com.ad
hence i used the following grep command to remove duplicates :
grep -m1 'com.a' filename
but it was not giving the correct output as i wanted the output to be such that it removes duplicates and also finds distinct pattern:
com.ab
com.ac
com.ad
how do i achieve the above output using grep
CodePudding user response:
You can use
grep -F 'com.a' file | sort -u
awk '/com\.a/' file | sort -u
awk 'index($0, "com.a")' file | sort -u
awk 'index($0, "com.a") && !seen[$0] ' file
Here, the grep -F 'com.a' file
searches for a fixed com.a
text in file
(awk
searches for a com.a
substring on every line using the com\.a
regex, and index($0, "com.a")
version searches for com.a
as literal string) and sort -u
sorts the output and returns unique values.
The awk 'index($0, "com.a") && !seen[$0] ' file
solution is probably the best, everything is done in a single awk
, see the online demo. Only those unique lines are printed that contain a com.a
substring.
CodePudding user response:
Let me show you my favourite:
| sort | uniq
When you put this after some list (like cat filename
), you only get the distinct values, the duplicates get removed.
The reason I'm using this, is the flexibility: you can easily add a criterion to the sorting, like sort -k3 -n
, and in case needed, you can count the amount of duplicates, adding -c
at the uniq
command, which might all be combined into | sort -k3 -n | uniq -c
, which first orders your list, based on the third column, the sorting is done in a numerical way and afterwards, the duplicates get shown and counted.