I have 7,000 files(sade1.pdbqt ... sade7200.pdbqt). Only some of the files contains second and so occurrence of a keyword TORSDOF. For a given file, I want to remove all lines following the first occurrence if there is second occurrence of keyword TORSDOF, while preserving the file names. Can somebody please provide a sample snippet. Thank you.
$ cat FileWith2ndOccurance.txt
ashu
vishu
jyoti
TORSDOF
Jatin
Vishal
Shivani
TORSDOF
Sushil
Kiran
after function run
$ cat FileWith2ndOccurance.txt
ashu
vishu
jyoti
TORSDOF
EDIT1: Actual file copy-
REMARK Name = 17-DMAG.cdx
REMARK 8 active torsions:
REMARK status: ('A' for Active; 'I' for Inactive)
REMARK 1 A between atoms: C_1 and N_8
REMARK 2 A between atoms: N_8 and C_9
REMARK 3 A between atoms: C_9 and C_10
REMARK 4 A between atoms: C_10 and N_11
REMARK 5 A between atoms: C_15 and O_17
REMARK 6 A between atoms: C_25 and O_28
REMARK 7 A between atoms: C_27 and O_33
REMARK 8 A between atoms: O_28 and C_29
REMARK x y z vdW Elec q Type
REMARK _______ _______ _______ _____ _____ ______ ____
ROOT
ATOM 1 C UNL 1 7.579 11.905 0.000 0.00 0.00 0.000 C
ATOM 2 C UNL 1 7.579 10.500 0.000 0.00 0.00 0.000 C
ATOM 30 O UNL 1 8.796 8.398 0.000 0.00 0.00 0.000 OA
ENDROOT
BRANCH 21 31
ATOM 31 O UNL 1 13.701 7.068 0.000 0.00 0.00 0.000 OA
ATOM 32 C UNL 1 12.306 6.953 0.000 0.00 0.00 0.000 C
ENDBRANCH 41 42
ENDBRANCH 19 41
TORSDOF 8
REMARK Name = 17-DMAG.cdx
REMARK 8 active torsions:
REMARK status: ('A' for Active; 'I' for Inactive)
REMARK 1 A between atoms: C_1 and N_8
REMARK 2 A between atoms: N_8 and C_9
REMARK x y z vdW Elec q Type
REMARK _______ _______ _______ _____ _____ ______ ____
ROOT
ATOM 1 CL UNL 1 0.000 11.656 0.000 0.00 0.00 0.000 Cl
ENDROOT
TORSDOF 0
CodePudding user response:
What I would do:
#!/bin/bash
for file in sade*.pdbqt; do
count=$(grep -c '^TORSDOF' "$file")
if ((count==2)); then
awk '/^TORSDOF/{print;exit}1' "$file" > /tmp/.torsdof &&
mv /tmp/.torsdof "$file"
fi
done
CodePudding user response:
Something like this might work; it saves the first part of each file and outputs it in a new file when encountering a second TORSDOF
:
awk '
{
if (FNR == 1) {
count = 0
save = $0
} else if (count == 0)
save = save "\n" $0
}
/^TORSDOF/ {
count
if (count == 2) {
outFile = FILENAME ".new"
print save > outFile
close(outFile)
}
}
' sade*.pdbqt
for f in sade*.pdbqt.new
do
mv "$f" "${f%.new}"
done