Removing lines following double occurrence of a keyword in some files-CodePudding

I have 10,000 files(molecule1.pdbqt ... molecule10000.pdbqt). Only some of them contains second occurrence of a keyword TORSDOF. For a given file, I want to remove all lines following the second occurrence, if there, including the line containing the second occurrence of keyword TORSDOF, while preserving the file names. Can somebody please provide a sample snippet, if possible without loop(s). Thank you.

$ cat inputExample.txt
ashu
vishu
jyoti
TORSDOF
Jatin
Vishal
Shivani
TORSDOF
Sushil
Kiran

$ cat outputExample.txt
ashu
vishu
jyoti
TORSDOF
Jatin
Vishal
Shivani

CodePudding user response：

You can use awk for this:

$ awk '/TORSDOF/&&c  >0 {next} 1' inputExample.txt 
ashu
vishu
jyoti
TORSDOF
Jatin
Vishal
Shivani
Sushil
Kiran

Based on exactly the same question outside SO.

CodePudding user response：

This could be done as

cat input.txt | tr '\n' '|' | sed 's/TORSDOF|//2g' | tr '|' '\n' > output.txt

cat input.txt to print the file content
tr '\n' '|' to form a single line string
sed 's/TORSDOF|//2g' to replace the second and onward occurrence of the keyword
tr '|' '\n' to split the long line string into multi-line file
> output.txt to output the file