I have a folder with files like:
pe1_file1.txt
pe1_file2.txt
px1_file3.txt
px1_file4.txt etc
every file has lines such:
1123343 34 SDSD XV34 nameofdatabase 34 45455 4545
1145343 33 SD34 XT45 nameofdatabase 34 45455 4545
I would like to parse all the files of that folder (actually there are a bunch of folders) and build up a single text file that includes all the lines of all those text files that comply with a particular condition. The resulting file should only contain the first 5 values (up to nameofdatabase) AND the 3 first letters of the name of the file.
I tend to use the following code modified: The following passes all the filtered lines and with all the values. I want to omit the last three numbers and add "pe1" or "px2" as first value.
for FILE in files/*.txt;
do
firstchar=${FILE:0:4}
# how do I modify the nest line in order to add $firstchar ("pe1") and $1,$2,$3,$4,$5,$6 ???
awk '$3=="SDSD"&&$4=="cardatabase"' $FILE.txt >> TOTAL.txt
done
CodePudding user response:
there is no loop required
$ awk '$3=="SDSD" && $4=="cardatabase"
{print substr(FILENAME, 1, 4), $1, $2, $3, $4, $5, $6}' files/*.txt > total.txt
note that substr indexing starts with 1. Most likely $4
should be $5
based on your sample input.
CodePudding user response:
awk
conveniently defines the variable FILENAME
holding the current file name and provides the function substr
to extract a substring:
for FILE in files/*.txt;
do
awk '$3=="SDSD"&&$4=="cardatabase"{print substr(FILENAME, 0, 4), $1, $2, $3, $4, $5}' "$FILE.txt"
done > TOTAL.txt
You can avoid opening the result file multiple times by only redirecting the complete output of the loop once.