I am fairly confident this can't be done with grep
, unless there are some features that I don't know about. However I am hoping that if this is the case there might be some other Linux/Unix command line tool which will do the job I want.
This is a frequent problem when working with source code, so I am pretty sure there must be an adequete solution.
Problem:
I am working with some C source code, and I want to be able to grep for objects in my code to find the files containing the relevant information.
Here is a simple example:
- Search for all files which contain matches for "MyClass" in the namespace "MYNAMESPACE".
- Assume that although MyClass and MYNAMESPACE appear to be likely to be unique strings, in general they might not be.
- In my case, the namespace "MYNAMESPACE" appears in hundreds of source files.
- The actual name of the class I am searching for is "Parameter", which is such a generic word that it too appears in hundreds of files.
Here is what I want a grep-like tool to do:
- Specify a list of words to search for
- Return the list of files found where ALL words in the list of search words are found in the same file
- Do this recursively to obtain all results in all files in a directory
Surely there is a way to do this? This is essentially a filtering problem: Take all the files found (recursively) inside a directory, and apply a filter to them for each of the words in the input list. Files pass the filter if they contain at least one instance of each word.
CodePudding user response:
Maybe grep
piped to xargs grep
?
]$ grep -rl "NAMESPACE" | xargs grep -l "Parameter"
With these four files:
]$ tail -n 1 *.txt
==> needle_1.txt <==
...NAMESPACE...
...
...Parameter...
==> needle_2.txt <==
...Parameter...
...
...NAMESPACE...
==> not_needle_1.txt <==
...
...NAMESPACE...
...
==> not_needle_2.txt <==
...
...Parameter...
...
placed in each sub-directory (including .
) of:
.
├── dir_1
│ ├── dir_1
│ └── dir_2
└── dir_2
├── dir_1
└── dir_2
the result is:
]$ grep -rl "NAMESPACE" | xargs grep -l "Parameter" | sort
dir_1/dir_1/needle_1.txt
dir_1/dir_1/needle_2.txt
dir_1/dir_2/needle_1.txt
dir_1/dir_2/needle_2.txt
dir_1/needle_1.txt
dir_1/needle_2.txt
dir_2/dir_1/needle_1.txt
dir_2/dir_1/needle_2.txt
dir_2/dir_2/needle_1.txt
dir_2/dir_2/needle_2.txt
dir_2/needle_1.txt
dir_2/needle_2.txt
needle_1.txt
needle_2.txt
CodePudding user response:
I suggest to use gawk
(standard Linux awk
) script.
Scanning each file once for all the words (read each file as a single record).
Count matched words in file.
Print file name only if all words matched.
script.awk
BEGIN {
RS="!@!@!@!@!@!@!@"; # set record seperator to something unlikely matched, causing each file to be read entirely as a single record
getline wordsListStr < wordsListFile ; # read wordsListFile as single string wordsListStr
close(wordsListFile) ;
wordsListCount = split(wordsListStr, wordsListArr, "\n"); # split wordsListStr by newLine into array wordsListArr, saved array length into wordsListCount
for (currWord in wordsListArr) wordsMatchArr[currWord] = 0; # reset array wordsMatchArr to 0
}
{ # for each file (read as single record)
for (currWord in wordsListArr) { # for each matching word
if ($0 ~ currWord) wordsMatchArr[currWord] = 1; # if a word was matched mark it a match in wordsMatchArr
}
}
ENDFILE { # post processing each file
for (currWord in wordsListArr) { # scan wordsListArr
wordsMatchCountInFile = wordsMatchArr[currWord]; # count number of matched words
wordsMatchArr[currWord] = 0; # reset wordsMatchArr for next file
}
if (wordsMatchCountInFile == wordsListCount) print FILENAME; # print current file if all words matched
wordsMatchCountInFile = 0; # reset words counter in file
}
Testing files
input.1.txt
word1
word2
word3
input.2.txt
word1
word2
input.3.txt
word3
word7
word8
input.4.txt
word3
word3
word7
word8
word1
word1
word7
word2
word8
testing output:
awk -v wordsListFile=input.1.txt -f script.awk input.{2,3,4}.txt
input.4.txt
scanning all C files under current directory
awk -v wordsListFile=nameSpacesListFile.txt -f script.awk $(find . -type f -name "*cpp")