grep: Search for multiple strings in files recursively, to find source code-CodePudding

I am fairly confident this can't be done with grep, unless there are some features that I don't know about. However I am hoping that if this is the case there might be some other Linux/Unix command line tool which will do the job I want.

This is a frequent problem when working with source code, so I am pretty sure there must be an adequete solution.

Problem:

I am working with some C source code, and I want to be able to grep for objects in my code to find the files containing the relevant information.

Here is a simple example:

Search for all files which contain matches for "MyClass" in the namespace "MYNAMESPACE".
Assume that although MyClass and MYNAMESPACE appear to be likely to be unique strings, in general they might not be.
In my case, the namespace "MYNAMESPACE" appears in hundreds of source files.
The actual name of the class I am searching for is "Parameter", which is such a generic word that it too appears in hundreds of files.

Here is what I want a grep-like tool to do:

Specify a list of words to search for
Return the list of files found where ALL words in the list of search words are found in the same file
Do this recursively to obtain all results in all files in a directory

Surely there is a way to do this? This is essentially a filtering problem: Take all the files found (recursively) inside a directory, and apply a filter to them for each of the words in the input list. Files pass the filter if they contain at least one instance of each word.

CodePudding user response：

Maybe grep piped to xargs grep?

]$ grep -rl "NAMESPACE" | xargs grep -l "Parameter"

With these four files:

]$ tail -n  1 *.txt
==> needle_1.txt <==
...NAMESPACE...
...
...Parameter...

==> needle_2.txt <==
...Parameter...
...
...NAMESPACE...

==> not_needle_1.txt <==
...
...NAMESPACE...
...

==> not_needle_2.txt <==
...
...Parameter...
...

placed in each sub-directory (including .) of:

.
├── dir_1
│   ├── dir_1
│   └── dir_2
└── dir_2
    ├── dir_1
    └── dir_2

the result is:

]$ grep -rl "NAMESPACE" | xargs grep -l "Parameter" | sort
dir_1/dir_1/needle_1.txt
dir_1/dir_1/needle_2.txt
dir_1/dir_2/needle_1.txt
dir_1/dir_2/needle_2.txt
dir_1/needle_1.txt
dir_1/needle_2.txt
dir_2/dir_1/needle_1.txt
dir_2/dir_1/needle_2.txt
dir_2/dir_2/needle_1.txt
dir_2/dir_2/needle_2.txt
dir_2/needle_1.txt
dir_2/needle_2.txt
needle_1.txt
needle_2.txt

CodePudding user response：

I suggest to use gawk (standard Linux awk) script. Scanning each file once for all the words (read each file as a single record).

Count matched words in file.

Print file name only if all words matched.

script.awk

BEGIN {
  RS="!@!@!@!@!@!@!@"; # set record seperator to something unlikely matched, causing each file to be read entirely as a single record
  getline wordsListStr < wordsListFile ; # read wordsListFile as single string wordsListStr
  close(wordsListFile) ; 
  wordsListCount = split(wordsListStr, wordsListArr, "\n"); # split wordsListStr by newLine into array wordsListArr, saved array length into wordsListCount
  for (currWord in wordsListArr) wordsMatchArr[currWord] = 0; # reset array wordsMatchArr to 0
}
{ # for each file (read as single record)
  for (currWord in wordsListArr) { # for each matching word
    if ($0 ~ currWord) wordsMatchArr[currWord] = 1; # if a word was matched mark it a match in wordsMatchArr
  }
}
ENDFILE { # post processing each file
  for (currWord in wordsListArr) {  # scan wordsListArr 
    wordsMatchCountInFile  = wordsMatchArr[currWord]; # count number of matched words
    wordsMatchArr[currWord] = 0; # reset wordsMatchArr for next file
  }
  if (wordsMatchCountInFile == wordsListCount) print FILENAME; # print current file if all words matched
  wordsMatchCountInFile = 0; # reset words counter in file
}

Testing files

input.1.txt

word1
word2
word3

input.2.txt

word1
word2

input.3.txt

word3
word7
word8

input.4.txt

word3
word3
word7
word8
word1
word1
word7
word2
word8

testing output:

awk -v wordsListFile=input.1.txt -f script.awk input.{2,3,4}.txt
input.4.txt

scanning all C files under current directory

awk -v wordsListFile=nameSpacesListFile.txt -f script.awk $(find . -type f -name "*cpp")