Home > Software design >  How can I use Regular Expressions to Ignore Delineated Blocks of Text?
How can I use Regular Expressions to Ignore Delineated Blocks of Text?

Time:04-12

Is there a regular expression for excluding delineated text, i.e. anything that falls between two symbols? I can get it to work when I want to include only things only between the two symbol (for example, quotes):

$str = 'Here is the text, "text!", I want to format.'
echo preg_replace( "/(?<=[\"])[^\"]*(?=[\"])/", "?", $str );

yielding:

Here is the text, "?", I want to format.

But I can't figure out the opposite — to have the regex engine ignore whatever is in between the quotes, parentheses, braces, etc. Theoretically, such a solution would yield:

? "text!" ?

There are a lot of case-by-case answers out there, but I can't find a generalized solution that I can apply to different situations when I want to do a regex search that excludes a block of text.

CodePudding user response:

Use a capture group to match the part you want to keep in the result.

preg_replace('/.*?("[^"]*").*/', '? $1 ?', $str);

CodePudding user response:

Maybe try something like

("[^"]*")(*SKIP)(*F)|(?:(?!(?1)).) 
regex note
("[^"]*") catches anything enclosed in quotes, and put in group 1
...(*SKIP)(*F)| patterns you want to avoid, in this case is the rule that defined in group 1
(?:(?!(?1)).) match any string that does not contain a substring that matches the rule in group 1

Check the test case here

Also, you are able to replace any pattern in group 1 to filter out any pattern you want, such as

(\([^)(]*\))(*SKIP)(*F)|(?:(?!(?1)).)

  • Related