Given this haystack and Regex PCRE2(PHP>=7.3)
:
#1 #2 #3
green [foo] [foo1]
red [foo]
blue [foo] [foo1] [foo2]
yellow [foo2]
green [foo]
green [foo] [foo1]
red [foo]
pink [foo3]
Where:
#1
is always a string that can contain numbers but no spaces.
#2
is always a random amount of space between #1
and 3
.
#3
same as #1
but inside of brackets [ ]
and can exist multiple brackets.
I'm trying to remove all lines containing dupes on #1
but keeping the last dupe line found.
It would look like:
blue [foo] [foo1] [foo2]
yellow [foo2]
green [foo] [foo1]
red [foo]
pink [foo3]
Cleared all lines that contain the same string on #1
keeping only the last.
And the lines that don't contain dupes on #1
as for example:
pink [foo3]
keep them.
I tried to explain it in the most detail possible, let me know if it is still unclear or if it's not possible with regex.
CodePudding user response:
You can convert matches of the following regular expression (with flags g
, m
and i
) to empty strings:
^([a-z\d]).*\n(?![\s\S]*\b^\1\b)
The flag g
prevents returning after the first match, m
(multiline) causes ^
and $
to match the beginning and end of lines rather than the beginning and end of the string, and i
makes matches case insensitive.
The elements of the expression are as follows:
^ # match beginning of line
([a-z\d]) # match one or more letters or digits and save to capture group 1
.* # match zero or more characters other than newlines
\n # match linefeed
(?! # begin negative lookahead
[\s\S]* # match zero or more characters including line terminators
\b^\1\b # match content of group 1 with word breaks before and after
) # end negative lookahead
Note that .
matches carriage returns \r
. If the last line may not end with a line feed change \n
to (?:\n|$)
.
If you wish to identify any strings that do not possess the required format you can use the following regular expression to match incorrectly-formatted lines:
^(?![a-z\d]*(?: *\[[^[\]\r\n]*\]) \r?\n).*
Hover your cursor over each element of the expression at the link to obtain an explanation of the function of that element.
CodePudding user response:
You could use
^(\S )\h \[\S*\](?!\S).*$(?![\s\S]*^\1)
^
Start of string(\S )
Capture group 1\h
Match 1 spaces\[\S*\](?!\S)
Match from an opening[
till closing]
and assert a whitespace boundary to the right to not match[foo]a
.*$
Match the rest of the line(?![\s\S]*^\1)
Negative lookahead, assert that capture group 1 does not occur anymore in the text
See a regex demo | PHP demo.
For example
$re = '/^(\S )\h \[\S*\](?!\S).*$(?![\s\S]*^\1)/m';
$str = 'green [foo] [foo1]
red [foo]
blue [foo] [foo1] [foo2]
yellow [foo2]
green [foo]
green [foo] [foo1]
red [foo]
pink [foo3]';
preg_match_all($re, $str, $matches);
print_r($matches[0]);
Output
Array
(
[0] => blue [foo] [foo1] [foo2]
[1] => yellow [foo2]
[2] => green [foo] [foo1]
[3] => red [foo]
[4] => pink [foo3]
)