Home > other >  Removing duplicates in a string using grep
Removing duplicates in a string using grep

Time:10-06

i have a log file containing modules and queries in this order:

com.ab
com.ab
com.ac
com.ad
com.ab
com.ac
com.ad

hence i used the following grep command to remove duplicates :

grep -m1 'com.a' filename

but it was not giving the correct output as i wanted the output to be such that it removes duplicates and also finds distinct pattern:

com.ab
com.ac
com.ad

how do i achieve the above output using grep

CodePudding user response:

You can use

grep -F 'com.a' file | sort -u
awk '/com\.a/' file | sort -u
awk 'index($0, "com.a")' file | sort -u
awk 'index($0, "com.a") && !seen[$0]  ' file

Here, the grep -F 'com.a' file searches for a fixed com.a text in file (awk searches for a com.a substring on every line using the com\.a regex, and index($0, "com.a") version searches for com.a as literal string) and sort -u sorts the output and returns unique values.

The awk 'index($0, "com.a") && !seen[$0] ' file solution is probably the best, everything is done in a single awk, see the online demo. Only those unique lines are printed that contain a com.a substring.

CodePudding user response:

Let me show you my favourite:

| sort | uniq

When you put this after some list (like cat filename), you only get the distinct values, the duplicates get removed.

The reason I'm using this, is the flexibility: you can easily add a criterion to the sorting, like sort -k3 -n, and in case needed, you can count the amount of duplicates, adding -c at the uniq command, which might all be combined into | sort -k3 -n | uniq -c, which first orders your list, based on the third column, the sorting is done in a numerical way and afterwards, the duplicates get shown and counted.

  • Related