How do I use grep to count the number of occurrences of a string?
input:
.
├── a.txt
├── b.txt
// a.txt
aaa
// b.txt
aaa
bbb
ccc
Now I want to know how many times aaa
and bbb
appear.
output:
aaa: 2
bbb: 1
CodePudding user response:
Just an idea:
grep -E "aaa|bbb|ccc" *.txt | awk -F: '{print $2}' | sort | uniq -c
This means:
grep -E "...|..." : extended grep, look for all entries
The result is given as:
a.txt:aaa
b.txt:aaa
b.txt:bbb
b.txt:ccc
awk -F: '{print $2}' : split the result in 2 columns,
based on the semicolon,
and only show the second column
sort | uniq -c : sort and count unique entries
CodePudding user response:
You can try awk
. This uses split
to count the occurrences of the search patterns and puts them in the "associative" array n
.
$ awk 'BEGIN{ pat1="aaa"; pat2="bbb" }
{ n[pat1] =(split($0,arr,pat1)-1) }
{ n[pat2] =(split($0,arr,pat2)-1) }
END{ for(i in n){ print i":",n[i] } }' a.txt b.txt
aaa: 10
bbb: 14
Example data
$ cat a.txt
aaa
aaa efwepom dq
bbb qwpdo bbb
qwdo qwdpomaaa
qwo bbb
pefaaaomaaaewe bb aa
aaa bbb
$ cat b.txt
aaa
aaa efwepom dq
bbb qwpdo bbb
qwdo qwdpomaaa
qwo bbb
pebbb bbb fobbbmebbbwe bb aa
aaa bbb
bbbbbbsad
CodePudding user response:
Assuming you do not want them separated out by filename, since you specify grep
, I'd loop the inputs. The problem is if you have more than one item on a single line.
for pattern in aaa bbb; do
printf "%s: " "$pattern"
cat a.txt b.txt | grep -Ec "$pattern"
done
aaa: 2
bbb: 1
awk
is cleaner though.