Home > database >  Function to grep files containing arbitrary number of strings (boolean and)
Function to grep files containing arbitrary number of strings (boolean and)

Time:08-24

I'm hoping to write a bash script to grep files matching several strings.

I've found the 'hard-wired' solution here: (Source: Find files containing multiple strings):

find . -type f -exec grep -l 'string1' {} \; | xargs grep -l 'string2' | xargs grep -l 'string3' | xargs grep -l 'string4'

Would return all files containing string1 && string2 && string3 && string4.

How can I convert this into a bash shell function fn that takes in an arbitrary number of arguments? I'd like fn string1 string2 string3 string4 ... to give me identical results. Persumably this would involve looping through arguments and piping results to successive commands xargs grep -l ${!i}

CodePudding user response:

If your grep supports -P (PCRE) option, how about:

# grep files containing arbitrary number of strings
fn() {
    local dir=$1                        # directory to search
    shift
    local -a patterns=("$@")            # list of target strings
    local i                             # local variable
    local pat="(?s)"                    # single mode makes dot match a newline
    for i in "${patterns[@]}"; do
        pat ="$(printf "(?=.*\\\b%s\\\b)" "$i")"
    done
    find . -type f -exec grep -zlP "$pat" {} \;
}

# example of usage
fn . string1 string2 string3

If the passed word list is word1 word2, it generates a regex pattern "(?s)(?=.*\bword1\b)(?=.*\bword2\b) which matches a file containing both word1 and word2 in any order.

  • (?s) specifies a "single mode" making a dot match any characters including a newline.
  • -z option to grep sets the input record separator to a null character. Then the whole file is treated as a single line.

If grep -P is not available, here is an alternative using a loop:

fn() {
    local dir=$1                        # directory to search
    shift
    local -a patterns=("$@")            # list of target strings
    local i f flag                      # local variables

    while IFS= read -rd "" f; do        # loop over the files fed by "find"
        fail=0                          # flag to indicate match fails
        for i in "${patterns[@]}"; do   # loop over the target strings
            grep -q "$i" "$f" || { fail=1; break; }
                                        # if not matched, set the flag and exit the loop
        done
        (( fail == 0 )) && echo "$f"    # if all matched, print the filename
    done < <(find . -type f -print0)
}
  • Related