linux. filter all the lines of folder files and add them up in a single file-CodePudding

I have a folder with files like:

pe1_file1.txt
pe1_file2.txt
px1_file3.txt
px1_file4.txt  etc

every file has lines such:
1123343 34 SDSD XV34 nameofdatabase 34 45455 4545 
1145343 33 SD34 XT45 nameofdatabase 34 45455 4545

I would like to parse all the files of that folder (actually there are a bunch of folders) and build up a single text file that includes all the lines of all those text files that comply with a particular condition. The resulting file should only contain the first 5 values (up to nameofdatabase) AND the 3 first letters of the name of the file.

I tend to use the following code modified: The following passes all the filtered lines and with all the values. I want to omit the last three numbers and add "pe1" or "px2" as first value.

for FILE in files/*.txt;
do
  firstchar=${FILE:0:4}
  # how do I modify the nest line in order to add $firstchar ("pe1") and $1,$2,$3,$4,$5,$6 ???
  awk '$3=="SDSD"&&$4=="cardatabase"' $FILE.txt >> TOTAL.txt
done

CodePudding user response：

there is no loop required

$ awk '$3=="SDSD" && $4=="cardatabase" 
           {print substr(FILENAME, 1, 4), $1, $2, $3, $4, $5, $6}'  files/*.txt > total.txt

note that substr indexing starts with 1. Most likely $4 should be $5 based on your sample input.

CodePudding user response：

awk conveniently defines the variable FILENAME holding the current file name and provides the function substr to extract a substring:

for FILE in files/*.txt;
do
  awk '$3=="SDSD"&&$4=="cardatabase"{print substr(FILENAME, 0, 4), $1, $2, $3, $4, $5}' "$FILE.txt"
done > TOTAL.txt

You can avoid opening the result file multiple times by only redirecting the complete output of the loop once.