Home > Net >  grep not counting occurrences which span more than one line
grep not counting occurrences which span more than one line

Time:05-10

I am using grep to count the occurrences of a particular string in my code, but grep is not counting the occurrences which span more than one line.

I am trying to find occurrences of (` including the ones which look like

 (
       `

Basically, the backtick is in the next line.

I tried so far:

 grep -roh -E "\(\s*\`" . | wc -l

But it doesn't count them. Even

grep -roh -E "\(\n" . | wc -l        

this is giving 0.

What would be the solution to this?

CodePudding user response:

find -type f -exec cat {}   | tr -d '[:space:]' | grep -oF '(`' | wc -l
  • find catenates contents of all files into a stream
  • tr reads stream and strips whitespace
  • grep outputs occurrences of the string (-o is GNU extension)
  • wc counts them

CodePudding user response:

The following assumes the strings you want to count start with an opening parenthesis, followed by spaces and end with a backtick, with at most one newline in the spaces. We can use sed (tested with GNU sed) to remove the newlines before passing all this to grep and wc:

$ s='abc
text (   
    `
def
   text   (
   `
ghi ( ` (` jkl
'

$ sed -Ez ':a;s/(.*)\([[:blank:]]*\n[[:blank:]]*`(.*)/\1\(`\2/g;ta' <<< "$s"
abc
text (`
def
   text   (`
ghi ( ` (` jkl

$ sed -Ez ':a;s/(.*)\([[:blank:]]*\n[[:blank:]]*`(.*)/\1\(`\2/g;ta' <<< "$s" |
  grep -Eo '\(\s*`'
(`
(`
( `
(`

$ sed -Ez ':a;s/(.*)\([[:blank:]]*\n[[:blank:]]*`(.*)/\1\(`\2/g;ta' <<< "$s" |
  grep -Eo '\(\s*`' | wc -l
4

The sed script uses the -z option to separate lines by NUL characters. It substitutes any of your string that contains a newline by just an opening parenthesis, followed by a backtick and loops as long as there are substitutions.

To apply this on all files under the current directory you will need find to concatenate them and pipe to sed:

$ find . -type f -exec cat {} \; |
  sed -Ez ':a;s/(.*)\([[:blank:]]*\n[[:blank:]]*`(.*)/\1\(`\2/g;ta' |
  grep -Eo '\(\s*`' | wc -l
1257
  • Related