Home > database >  Removing everything except numbers inside braces and some characters
Removing everything except numbers inside braces and some characters

Time:08-10

Want to remove everything except # NewLine, complete bracket set and numbers inside braces.
Sample input:

# (1296) {20} [529] [1496] [411]
# (MONDAY ) (1296)
# (646) {20} (BEACH 7) [20 Mtrs] { 03 Foot }
# {19} [455] [721] (1296) (SUNDAY ) [2741] (MONDAY (WEDNESDAY {20}
# {19} (1296)

Code which does not work:

$re = '/(?:\[[^][]*]|\([^()]*\)|{[^{}]*})(*SKIP)(*F)|[^][(){}@#] /m';
$result = preg_replace($re, '', $input);

Incorrect output:

#(1296){20}[529][1496][411]
#(1296) 
#(646){20}(BEACH 7)[20 Mtrs]{ 03 Foot }
#{19}[455][721](1296)[2741](({20}
#{19}(1296)

Desired output:

#(1296) {20} [529] [1496] [411]
#(1296)
#(646) {20}
#{19} [455] [721] (1296) [2741] {20}
#{19} (1296)

CodePudding user response:

You could match at least 1 digit between the brackets and then skip that match.

Then match any char except a newline or # to be replaced with an empty string.

(?:\[\h*\d[\h\d]*]|\(\h*\d[\h\d]*\)|{\h*\d[\h\d]*})\h*(*SKIP)(*F)|[^#\n]

Explanation

  • (?: Non capture group
    • \[\h*\d[\h\d]*] Match at least 1 digit between square brackets, where \h matches horizontal whitespace characters (no newlines)
    • | Or
    • \(\h*\d[\h\d]*\) 1 digit between parenthesis
    • | Or
    • {\h*\d[\h\d]*} 1 digit between curly braces
  • )\h* Close the non capture group and match 1 spaces
  • (*SKIP)(*F) Skip and fail the match (to leave it untouched in the output)
  • | Or
  • [^#\n] Match any character except # or a newline

Regex demo

CodePudding user response:

You may match using this regex:

(?:(\()|({)|\[)[\h\d]* ([^])}\s\d])(?(1)[^()]*\)|(?(2)[^{}]*}|[^][]*]))\h*|(?<=#)\h |\([^\s)]  \h 

and replace with an empty string.

RegEx Demo

RegEx Details:

  • (?:(\()|({)|\[)[\h\d]* ([^])}\s\d])(?(1)[^()]*\)|(?(2)[^{}]*}|[^][]*])): Match (...) or {...} or [...] if they contain at least one non-digit
  • \h*: Match 0 or more whitespace
  • |: OR
  • (?<=#)\h : Match 1 whitespaces after #
  • |: OR
  • \([^\s)] \h : Match ( and 1 of non-whitespace text followed by 1 whitespaces
  • Related