How do I return the number of times each name appears in a file? [Command Line]-CodePudding

Given a file with an unspecified amount of names, how do I count the number of times each of those names appears in the file without knowing the name being searched for?

Yes, awkward spacing of names between commas is part of the file's standard expected format.

Sample_Names.txt :

Adam, Bob ,Billy, Cassandra ,Cally , Catherine, George
Amanda, Bob , Cassandra , Harry, Julie
Adam, Bob ,Billy, Harry, Larry

I'm currently at this configuration for a command:

awk -F , '{for(i=1; i <= NF; i  ) grep $i | wc -l;}' Sample_Names.txt

This returns:

awk: line 1: syntax error at or near wc

Successful execution of Command(s) or Shell script should return a file that looks like:

Adam 2
Amanda 1
Billy 2
 Bob  3
Cally  1
 Cassandra  2
 Catherine 1
 George 1
 Harry 2
 Julie 1
 Larry 1

or something similar

CodePudding user response：

With your awk, -F , sets field separator to comma alone, but you still need spaces.

If you want to run shell commands from awk, you need system().

But that's not necessary, you can use pure awk:

awk -F '[[:space:],] ' '
{
     for (i=1; i<=NF; i  ) {
         names[$i]  
     }
}
END {
    for (i in names) {
        print names[i]"\t"i
    }
}'

You can pipe this to sort -rnk 1,1 to sort by frequency.

If you have grep -o, there's also:

grep -Eo '[[:alpha:]] ' | sort | uniq -c | sort -rn -k1,1

This won't work with non ascii characters like ü in certain locales (eg LC_ALL=POSIX, LANG=C). It will split names on those characters.

You can split on delimiter characters instead, like the awk, which is more flexible:

grep -Eo '[^[:space:],] ' | sort | uniq -c | sort -rn -k1,1

CodePudding user response：

Using GNU utils:

tr -s ',' '\n' < example.txt | sed 's/^[ ]*//; s/[ ]*$//' | sort | uniq -c
   2 Adam
   1 Amanda
   2 Billy
   3 Bob
   1 Cally
   2 Cassandra
   1 Catherine
   1 George
   2 Harry
   1 Julie
   1 Larry

Explanation:

tr -s ',' '\n' < example.txt <- replace all commas with newlines

sed 's/^[ ]*//; s/[ ]*$//' <- remove any whitespace before and after each name

sort | uniq -c <- sort the names, then count the occurrence of each name

You can also use awk to reorder the output if required, e.g.

tr -s ',' '\n' < example.txt | sed 's/^[ ]*//; s/[ ]*$//' | sort | uniq -c | awk '{print $2, $1}'
Adam 2
Amanda 1
Billy 2
Bob 3
Cally 1
Cassandra 2
Catherine 1
George 1
Harry 2
Julie 1
Larry 1

CodePudding user response：

I don't think you need awk for this, try simply adding the -o directive to the grep command in the for loop. That should find each string match and output the matches 1 per line, which wc can easily handle.