Home > Net >  Removing lines depending upon keyword occurance
Removing lines depending upon keyword occurance

Time:10-19

I have 7,000 files(sade1.pdbqt ... sade7200.pdbqt). Only some of the files contains second and so occurrence of a keyword TORSDOF. For a given file, I want to remove all lines following the first occurrence if there is second occurrence of keyword TORSDOF, while preserving the file names. Can somebody please provide a sample snippet. Thank you.

$ cat FileWith2ndOccurance.txt
ashu
vishu
jyoti
TORSDOF
Jatin
Vishal
Shivani
TORSDOF
Sushil
Kiran
after function run
$ cat FileWith2ndOccurance.txt
ashu
vishu
jyoti
TORSDOF

EDIT1: Actual file copy-

REMARK  Name = 17-DMAG.cdx
REMARK  8 active torsions:
REMARK  status: ('A' for Active; 'I' for Inactive)
REMARK    1  A    between atoms: C_1  and  N_8
REMARK    2  A    between atoms: N_8  and  C_9
REMARK    3  A    between atoms: C_9  and  C_10
REMARK    4  A    between atoms: C_10  and  N_11
REMARK    5  A    between atoms: C_15  and  O_17
REMARK    6  A    between atoms: C_25  and  O_28
REMARK    7  A    between atoms: C_27  and  O_33
REMARK    8  A    between atoms: O_28  and  C_29
REMARK                            x       y       z     vdW  Elec       q    Type
REMARK                         _______ _______ _______ _____ _____    ______ ____
ROOT
ATOM      1  C   UNL     1       7.579  11.905   0.000  0.00  0.00     0.000 C 
ATOM      2  C   UNL     1       7.579  10.500   0.000  0.00  0.00     0.000 C 

ATOM     30  O   UNL     1       8.796   8.398   0.000  0.00  0.00     0.000 OA
ENDROOT
BRANCH  21  31
ATOM     31  O   UNL     1      13.701   7.068   0.000  0.00  0.00     0.000 OA
ATOM     32  C   UNL     1      12.306   6.953   0.000  0.00  0.00     0.000 C 
ENDBRANCH  41  42
ENDBRANCH  19  41
TORSDOF 8
REMARK  Name = 17-DMAG.cdx
REMARK  8 active torsions:
REMARK  status: ('A' for Active; 'I' for Inactive)
REMARK    1  A    between atoms: C_1  and  N_8
REMARK    2  A    between atoms: N_8  and  C_9
REMARK                            x       y       z     vdW  Elec       q    Type
REMARK                         _______ _______ _______ _____ _____    ______ ____
ROOT
ATOM      1 CL   UNL     1       0.000  11.656   0.000  0.00  0.00     0.000 Cl
ENDROOT
TORSDOF 0

CodePudding user response:

What I would do:

#!/bin/bash

for file in sade*.pdbqt; do
    count=$(grep -c '^TORSDOF' "$file")
    if ((count==2)); then
        awk '/^TORSDOF/{print;exit}1' "$file" > /tmp/.torsdof &&
            mv /tmp/.torsdof "$file"
    fi
done

CodePudding user response:

Something like this might work; it saves the first part of each file and outputs it in a new file when encountering a second TORSDOF:

awk '
    {
        if (FNR == 1) {
            count = 0
            save = $0
        } else if (count == 0)
            save = save "\n" $0
    } 
    /^TORSDOF/ {
          count
        if (count == 2) {
            outFile = FILENAME ".new"
            print save > outFile
            close(outFile)
        }
    }
' sade*.pdbqt

for f in sade*.pdbqt.new
do
    mv "$f" "${f%.new}"
done
  • Related