Home > Software design >  Regex search to find and remove consecutive lines which end with same characters
Regex search to find and remove consecutive lines which end with same characters

Time:11-15

I need to write a regular expression search which will locate when a line ends with the same text as the preceding line, but does not have the same first 10 characters. So in this example:

[11:12:21] Hello this is Tom. How are you?
[11:14:08] Hello this is Tom. How are you?

. . . I would need to search for consecutive lines for which the text was the same after the time entered in brackets.

I know that this search:

FIND: ^.{11}(.*)$
REPLACE; $1

. . . will locate the first 11 characters and remove them.

This search:

FIND: ^((.{10}).*)(?:\r?\n\2.*) 
REPLACE: $1

. . . will locate lines where the first 10 characters are the same and remove them.

But I can't figure out how to structure the search so it checks the text from position 11 to the end of the line, and then checks if the text on the next line from the 11th character to the end of the line is the same.

CodePudding user response:

If you want to match square brackets from the start of the string and the part after it should be the same on the next line, you can use 2 capture groups to also keep the first part when replacing:

^(\[[^][]*])(.*)(?:\r?\n\[[^][]*]\2) 

The pattern matches:

  • ^ Start of string
  • (\[[^][]*]) Capture group 1, match from [...]
  • (.*) Capture group 2, match the rest of the line
  • (?: Non capture group
    • \r?\n Match a newline
    • \[[^][]*] Match from [...]
    • \2 Match the same as what was previously captured in group 2
  • ) Close the non capture group and match 1 or more lines

Replace with:

$1$2

See a regex demo.

If it also should not have the same characters between the square brackets on the next line, you can use a negative lookahead after the newline (?!\1) to assert not the same value as in the first capture group:

^(\[[^][]*])(.*)(?:\r?\n(?!\1)\[[^][]*]\2) 

See another regex demo.


If it should be 10 characters:

^(.{10})(.*)(?:\r?\n.{10}\2) 

See a regex demo.

And not with the same 10 characters at the start of the next line using a negative lookahead:

 ^(.{10})(.*)(?:\r?\n(?!\1).{10}\2) 

See another regex demo

CodePudding user response:

Wow. This works perfectly!! Thanks so much. You have saved me an enormous amount of trouble.

  • Related