Home > Blockchain >  Need to exclude and match regex in bash
Need to exclude and match regex in bash

Time:07-24

i am new to bash and is trying to write a script that searches for some specific words in the codebase. But because there are a lot of false positives i also need to maintain a exclude_pattern list so that anything matching that is ignored Currently my script is returning the correct matches and the relevant line looks like this

output=$(find $sourceDir -path "*/.git" -prune -o -type f \( -name "*.cpp" -o -name "*.h" \) -exec grep -E -H -i -R --color=always "$matching_regex" {} \; )

Now i am unable to use this output and run a exclude pattern on it i tried t do something like this but it did not work

while IFS= read -r line
do
  foundFinal=$(grep -v "$exclude_matches" "$line")
done <<< "$output"

Maybe i do not need to do the exclude part separately but i could do both matching and excluding in the first command itself but i have so far been unsuccessful. Would be great if i can get any feedback or examples that could show me what i might be missing or doing incorrectly. Btw as already stated i am a newbie with bash, so if grep is not the command for my use case, please do not hesitate to comment.

CodePudding user response:

output=$(
    find "$sourceDir" \
        -name .git -prune \
      -o \
        -type f \( -name '*.cpp' -o -name '*.h' \) \
        -exec grep -E -H -i -- "$matching_regex" {}  
)
foundFinal=$(
    grep -E -v "exclude_matches" <<<"$output"
)

Or more efficiently, if you don't need output, just pipe the two together:

foundFinal=$(
    find "$sourceDir" \
        -name .git -prune \
      -o \
        -type f \( -name '*.cpp' -o -name '*.h' \) \
        -exec grep -E -H -i -- "$matching_regex" {}   \
    | grep -E -v -- "$exclude_matches"
)
  • I simplified the git check
  • I replaced \; with to reduce the number of invocations of grep
  • I removed -R (which should never succeed anyway)
  • I removed --color==always which could interfere with the second grep
  • I added -E to the second grep to match the first one
  • I added -- to protect against regex that start with hyphen

If you want to colourize for display, you can re-run the grep on the (presumably not too long) result:

grep --colour=auto -E -i -- "$matching_regex" <<<"$foundFinal"

CodePudding user response:

Assuming matching_regex.txt contains all the regexes you want to include, and exclude_matches.txt contains all the regexes you want to exclude.

try :

find $sourceDir -path "*/.git" -prune -o -type f\
    \( -name "*.cpp" -o -name "*.h" \)\
    -exec grep -E -H -i --color=always -f matching_regex.txt {}   |
    grep -E -i -v -f exclude_matches.txt

CodePudding user response:

Using xargs & GNU Awk (untested)

# get files 
find "$sourceDir" -name .git -prune -o \
      -type f \( -name '*.cpp' -o -name '*.h' \) -print0 |  
xargs -0 \
# piping find result (filenames) to awk 
awk -v mc="$match_regex" -v ex="$exclude_matches" '
    # filter matches and excludes
    match($0,mc) && ! match($0,ex)
    # display only line no filename
    # or --> match($0,mc) && ! match($0,ex){print FILENAME, $0}'
    # display filename && line
'
  • Related