Home > Software engineering >  looping with grep over several files
looping with grep over several files

Time:08-24

I have multiple files /text-1.txt, /text-2.txt ... /text-20.txt and what I want to do is to grep for two patterns and stitch them into one file.

For example:
I have

 grep "Int_dogs" /text-1.txt > /text-1-dogs.txt
 grep "Int_cats" /text-1.txt> /text-1-cats.txt
cat /text-1-dogs.txt /text-1-cats.txt > /text-1-output.txt

I want to repeat this for all 20 files above. Is there an efficient way in bash/awk, etc. to do this ?

CodePudding user response:

#!/bin/sh
count=1

next () {
[[ "${count}" -lt 21 ]] && main
[[ "${count}" -eq 21 ]] && exit 0
}

main () {
file="text-${count}"
grep "Int_dogs" "${file}.txt" > "${file}-dogs.txt"
grep "Int_cats" "${file}.txt" > "${file}-cats.txt"
cat "${file}-dogs.txt" "${file}-cats.txt" > "${file}-output.txt"
count=$((count 1))
next
}

next

CodePudding user response:

grep has some features you seem not to be aware of:

  1. grep can be launched on lists of files, but the output will be different: For a single file, the output will only contain the filtered line, like in this example:

    cat text-1.txt
    I have a cat.
    I have a dog.
    I have a canary.
    
    grep "cat" text-1.txt
    I have a cat.
    

    For multiple files, also the filename will be shown in the output: let's add another textfile:

    cat text-2.txt
    I don't have a dog.
    I don't have a cat.
    I don't have a canary.
    
    grep "cat" text-*.txt
    text-1.txt: I have a cat.
    text-2.txt: I don't have a cat.
    
  2. grep can be extended to search for multiple patterns in files, using the -E switch. The patterns need to be separated using a pipe symbol:

     grep -E "cat|dog" text-1.txt
     I have a dog.
     I have a cat.
    
  3. (summary of the previous two points the remark that grep -E equals egrep):

     egrep "cat|dog" text-*.txt
     text-1.txt:I have a dog.
     text-1.txt:I have a cat.
     text-2.txt:I don't have a dog.
     text-2.txt:I don't have a cat.
    

So, in order to redirect this to an output file, you can simply say:

egrep "cat|dog" text-*.txt >text-1-output.txt

CodePudding user response:

Assuming you're using bash. Try this:

for i in $(seq 1 20) ;do rm -f text-${i}-output.txt ; grep -E "Int_dogs|Int_cats" text-${i}.txt >> text-${i}-output.txt ;done

Details

This one-line script does the following:

  • Original files are intended to have the following name order/syntax:
    • text-<INTEGER_NUMBER>.txt - Example: text-1.txt, text-2.txt, ... text-100.txt.
  • Creates a loop starting from 1 to <N> and <N> is the number of files you want to process.
  • Warn: rm -f text-${i}-output.txt command first will be run and remove the possible outputfile (if there is any), to ensure that a fresh new output file will be only available at the end of the process.
  • grep -E "Int_dogs|Int_cats" text-${i}.txt will try to match both strings in the original file and by >> text-${i}-output.txt all the matched lines will be redirected to a newly created output file with the relevant number of the original file. Example: if integer number in original file is 5 text-5.txt, then text-5-output.txt file will be created & contain the matched string lines (if any).
  • Related