Home > Net >  Removing lines following double occurrence of a keyword in some files
Removing lines following double occurrence of a keyword in some files

Time:10-19

I have 10,000 files(molecule1.pdbqt ... molecule10000.pdbqt). Only some of them contains second occurrence of a keyword TORSDOF. For a given file, I want to remove all lines following the second occurrence, if there, including the line containing the second occurrence of keyword TORSDOF, while preserving the file names. Can somebody please provide a sample snippet, if possible without loop(s). Thank you.

$ cat inputExample.txt
ashu
vishu
jyoti
TORSDOF
Jatin
Vishal
Shivani
TORSDOF
Sushil
Kiran
$ cat outputExample.txt
ashu
vishu
jyoti
TORSDOF
Jatin
Vishal
Shivani

CodePudding user response:

You can use awk for this:

$ awk '/TORSDOF/&&c  >0 {next} 1' inputExample.txt 
ashu
vishu
jyoti
TORSDOF
Jatin
Vishal
Shivani
Sushil
Kiran

Based on exactly the same question outside SO.

CodePudding user response:

This could be done as

cat input.txt | tr '\n' '|' | sed 's/TORSDOF|//2g' | tr '|' '\n' > output.txt
  • cat input.txt to print the file content
  • tr '\n' '|' to form a single line string
  • sed 's/TORSDOF|//2g' to replace the second and onward occurrence of the keyword
  • tr '|' '\n' to split the long line string into multi-line file
  • > output.txt to output the file
  • Related