Removing duplicates from strings based on container [] {} ()-CodePudding

Line contains token inside Braces , Brackets , Parentheses and though they may share same number but they denote different subjects, i.e [10] and (10) are different. The goal is to write a program which removes duplicates from a line but not disturbing their order.

Sample input

# [1] {1} [2] {10} [3] [3] [4] {10} {100} [1] [5]
# [5] {1} [6] (10) (20) (10) {10} [7] [5] [6] {10} {100} [8] [5] (10) (30)
# (10) (30) [7] {1} [8] {10} [9] [7] [8] {100} {101} [9] [5] {101} (50)

Desired output

# [1] {1} [2] {10} [3] [4] {100} [5]
# [5] {1} [6] (10) (20) {10} [7] {100} [8] (30)
# (10) (30) [7] {1} [8] {10} [9] {100} {101} [5] (50)

Below the code can be used to remove duplicates numbers inside Parentheses, but not for Braces or brackets.

$re = '/\((\d )\)/';
 if (preg_match_all($re, $details_, $matches)) {
    print_r(($matches[1]));  
}

CodePudding user response：

$input = '[1] {1} [2] {10} [3] [3] [4] {10} {100} [1] [5]';

$result = array_unique(explode(' ', $input));

CodePudding user response：

You could match the format of the lines and use a pattern with capture 3 groups, where you would use the values of group 1 and group 2

^(#\h*)((\[\d ]|\{\d }|\(\d \))(?:\h (?3))*)$

In parts, the pattern matches:

^ Start of string
(#\h*) Capture group 1, match # and optional horizontal whitespace chars
( Capture group 2
- (\[\d ]|\{\d }|$\d $) Capture group 3, match either digits between square brackets, curly braces or parenthesis
- (?:\h (?3))* Repeat 1 horizontal whitespace chars and recurse the group 3 pattern
) Close group 2
$ End of string

See a regex demo and a PHP demo.

For example:

$re = '/^(#\h*)((\[\d ]|\{\d }|\(\d \))(?:\h (?3))*)$/m';
$str = '# [1] {1} [2] {10} [3] [3] [4] {10} {100} [1] [5]
# [5] {1} [6] (10) (20) (10) {10} [7] [5] [6] {10} {100} [8] [5] (10) (30)
# (10) (30) [7] {1} [8] {10} [9] [7] [8] {100} {101} [9] [5] {101} (50)';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
echo implode("\n", array_map(fn($x) => $x[1] . implode(' ', array_unique(explode(' ', $x[2]))), $matches));

Output

# [1] {1} [2] {10} [3] [4] {100} [5]
# [5] {1} [6] (10) (20) {10} [7] {100} [8] (30)
# (10) (30) [7] {1} [8] {10} [9] {100} {101} [5] (50)

If there are more than 1 spaces to split on, you can use preg_split('/\h /', $x[2]) instead of explode.