Delete the line above a pattern-CodePudding

I'm trying to delete lines that contain a certain pattern and the line directly above this specific pattern in a file. The pattern is 'Query '. The file looks something like this:

1. Query= ENST00000641267.1
2. Query= ENST00000641448.1
3. Query= MSTRG.3294.1
4. Query= ENST00000435134.2
5. Query= ENST00000503142.1
6. Query= ENST00000503142.1
7. Query    8  THSLRYFRLGVSDPIHGVPEFISVGYVDSHPITTYDSVTQQKEPRAPWMAENLVPDHWER 187
8. Query  188  YTQLLKGWQQMFRVELKRQQRHYNHSGSHTYQRMIGCELLEDGSTTGFLQYAYDGQNFLI 367
9. Query  368  FNKDTLS*LAVDNVAHTIKRAREANQHELQYQKNWLEEECIA*LKRFLEYGKDTQQ 535
10. Query= ENST00000612670.1
11. Query    1  MVFTQAPAEIMGHLRICSLLARQCLAEFLGVFVLMLLTQGAVAQAVTSGETKGNFFTMFL 180
12. Query  181  AGSLAVTIAIYVGGNVSG 234
13. Query= MSTRG.3309.1

So line 6 to 12 should be deleted while all other lines should be preserved. I've tried the following to remove the line before the pattern but can't get it to work:

tac | sed '/Query /'I,  1 d' | tac file.txt > newfile.txt

It just outputs the '>' sign. Can anyone help with this?

Desired output is:

    1. Query= ENST00000641267.1
    2. Query= ENST00000641448.1
    3. Query= MSTRG.3294.1
    4. Query= ENST00000435134.2
    5. Query= ENST00000503142.1
    13. Query= MSTRG.3309.1

Thanks!

CodePudding user response：

This might work for you (GNU sed):

sed '$!N;/\n.*Query /D;/Query /!P;D' file

Append the next line (unless the current line is the last line).

If the appended line contains Query , delete the first line and go again.

If the first line of the 2 line window contains Query , don't print it.

Otherwise print the first of the 2 lines, delete it and go again.

N.B. The appending of the next line is dependant on it not being the last, as the default behaviour of sed is print the pattern space if the N command is called to read passed the end of the file. This allows the last line to treated properly i.e. if the last line contains Query it will be deleted.

CodePudding user response：

$ tac file | awk '/Query /{c=2} !(c&&c--)' | tac
1. Query= ENST00000641267.1
2. Query= ENST00000641448.1
3. Query= MSTRG.3294.1
4. Query= ENST00000435134.2
5. Query= ENST00000503142.1
13. Query= MSTRG.3309.1

See Printing with sed or awk a line following a matching pattern for more info.

CodePudding user response：

I would use GNU AWK following way, let file.txt content be

1. Query= ENST00000641267.1
2. Query= ENST00000641448.1
3. Query= MSTRG.3294.1
4. Query= ENST00000435134.2
5. Query= ENST00000503142.1
6. Query= ENST00000503142.1
7. Query    8  THSLRYFRLGVSDPIHGVPEFISVGYVDSHPITTYDSVTQQKEPRAPWMAENLVPDHWER 187
8. Query  188  YTQLLKGWQQMFRVELKRQQRHYNHSGSHTYQRMIGCELLEDGSTTGFLQYAYDGQNFLI 367
9. Query  368  FNKDTLS*LAVDNVAHTIKRAREANQHELQYQKNWLEEECIA*LKRFLEYGKDTQQ 535
10. Query= ENST00000612670.1
11. Query    1  MVFTQAPAEIMGHLRICSLLARQCLAEFLGVFVLMLLTQGAVAQAVTSGETKGNFFTMFL 180
12. Query  181  AGSLAVTIAIYVGGNVSG 234
13. Query= MSTRG.3309.1

then

awk 'NR>1&&!/Query /&&prev!~/Query /{print prev}{prev=$0}END{if(prev!~/Query /){print prev}}' file.txt

output

1. Query= ENST00000641267.1
2. Query= ENST00000641448.1
3. Query= MSTRG.3294.1
4. Query= ENST00000435134.2
5. Query= ENST00000503142.1
13. Query= MSTRG.3309.1

Explanation: I use prev variable to store previous line, if current line does not match Query and previous line does not match Query then I print previous line. As I print previous line I need to consider last line separately, for which I use END.

(tested in GNU Awk 5.0.1)