Home > Enterprise >  I want to count distinct words in document using linux command?
I want to count distinct words in document using linux command?

Time:10-01

For below data set I tried with using uniq command but did not get satisfactory result

Meredith Norris Thomas;Regular Air;HomeOffice
Kara Pace;Regular Air;HomeOffice
Ryan Foster;Regular Air;HomeOffice

Code:

cat HomeOffice_sales.txt |tr " " "\n" | tr ";" "\n"| uniq -c

result I got was wrong as Air,Regular,HomeOffice word is thrice(expected 3 Home office) :

      1 Meredith
      1 Norris
      1 Thomas
      1 Regular
      1 Air
      1 HomeOffice
      1 Kara
      1 Pace
      1 Regular
      1 Air
      1 HomeOffice
      1 Ryan
      1 Foster
      1 Regular
      1 Air
      1 HomeOffice

CodePudding user response:

uniq only counts repeated lines that are together in the input, so you need to sort before piping to uniq.

tr ' ;' '\n\n' < HomeOffice_sales.txt | sort | uniq -c

You don't need multiple tr commands, you can give a list of input and output characters.

  • Related