Home > OS >  Regex Delete Lines with delimiter `<<<<<<< HEAD` and `=======` in Reverted Comm
Regex Delete Lines with delimiter `<<<<<<< HEAD` and `=======` in Reverted Comm

Time:09-26

Using a bad regex I accidentally deleted many lines I shouldn't have. After reverting the changes (a feature of Git Version Control), I have markdown files that look like this now:

<<<<<<< HEAD
There was a sentence here:
There was a third line here.
=======
There was a sentence here:
There was a second line here.
There was a third line here.
There were any number of lines here.
>>>>>>> parent of <commit ID> (<commit msg>)

My request is to use <<<<<<< HEAD and ======= as delimiters and delete all what's between the delimiters, including the delimiters as well. I would delete the >>>>>>> parent of <commit ID> (<commit msg>) bits separately afterwards.

My regex (.*) to match multiple lines between the delimiters was unsuccessful. I am using [{[(*"'!->1-9a-zA-ZÀ-ŰØ-űø-ÿ] instead of simple w to cater for any line-opening character/word I might want to be using. I have soft line breaks (two spaces) after each sentence, if that is important to you. (If you can match all what's between the delimiters, it might not even matter.)

Expected result:

There was a sentence here:
There was a second line here.
There was a third line here.
There were any number of lines here.
>>>>>>> parent of <commit ID> (<commit msg>)

As I said, I would deal with >>>>>>> parent of <commit ID> (<commit msg>) afterwards.
Also, it goes without saying that it is not always two lines between delimiters. Varying number of lines causes my issue.

CodePudding user response:

Instead of using a non greedy match, you can use a negative lookahead matching lines in between that do not consist only of ======= which is more perfomant:

^<<<<<<< HEAD(?:\R(?!=======$).*)* \R=======$

Explanation

  • ^ Start of string
  • <<<<<<< HEAD Match literally
  • (?: Non capture group
    • \R Match any unicode newline
    • (?!=======$) Negative lookahead, assert that the line is not =======
    • .* Match the whole line
  • )* Close the non capture group and optionally repeat it using a possessive quantifier
  • \R Match any unicode newline
  • ======= Match literally
  • $ End of string

enter image description here

CodePudding user response:

After perusing an answer from Wiktor here

I managed to find a match:

<<<<<<<[\d\D]*?=======

The ? is important to find more than one match within a document. Otherwise you are liable to delete more stuff again.

  • Related