I am using this to count the frequency in a text file using bash.
grep -ow -i "and" $1 | wc -l
It counts all the and in the file, including those that are part of compound words, like jerry-and-jeorge. These I wish to ignore and count all other independent and.
CodePudding user response:
With a GNU grep, you can use the following command to count and
words that are not enclosed with hyphens:
grep -ioP '\b(?<!-)and\b(?!-)' "$1" | wc -l
Details:
P
option enables the PCRE regex syntax\b(?<!-)and\b(?!-)
matches\b
- a word boundary(?<!-)
- a negative lookbehind that fails the match if there is a hyphen immediately to the left of the current locationand
- a fixed string\b
- a word boundary(?!-)
- a negative lookahead that fails the match if there is a hyphen immediately to the right of the current location.
See the online demo:
#!/bin/bash
s='jerry-and-jeorge, and, aNd, And.'
grep -ioP '\b(?<!-)and\b(?!-)' <<< "$s" | wc -l
# => 3 (not 4)