Home > Blockchain >  Perl regex to match number block and mask them but retain separator
Perl regex to match number block and mask them but retain separator

Time:04-17

I have a number to like

1111-1111-1111-1111
4444.4444.4444.4444
7777_7777_7777_7777
2222 2222 2222 2222
22 2222 2222 22
1 2 3 4 5555_5555 5647.1234

Wanted to mask them 

XXXX-XXXX-XXXX-XXXX
XXXX.XXXX.XXXX.XXXX
XXXX_XXXX_XXXX_XXXX
XXXX XXXX XXXX XXXX
22 XXXX XXXX 22
1 2 3 4 XXXX_XXXX XXXX.XXXX

I have tried

 qr/(\d{4,})([\,\_\-\#\*\.\s])(\d{4})/,
 qr/((\d{4})([\,|\_|\-|\#|\*|\s|\.])){1,}\d{4}/,
 qr/(?=\d{4})/,
 qr/(\d{4})(\s|\,|\_|\.|\-|\#|\*)(\d{4}){1,}(\d{4})/,
 qr/(\d{4})(\s|\,|\_|\.|\-|\#|\*)(\d{4})(\s|\,|\_|\.|\-|\#|\*)(\d{4})(\s|\,|\_|\.|\-|\#|\*)(\d{4})/,

but none worked since I want to retain the separator

Note:- I wanted to do it on a specific condition eg 1111 1111 is not valid due to the length rule.

when I encounter

1 11111 2222 2222 2222 
output
1 11111 XXXX XXXX XXXX

eg 2 

123  1234.1235
output 
123  XXXX.XXXX 
hence wanted to capture 
4digit => \d{4}
separator => [\,\s\.\-\_\#] 
above can occur any number of times
eg 
1111.1111.1111.1111.1111.1111 
output
XXXX.XXXX.XXXX.XXXX.XXXX.XXXX 

CodePudding user response:

You can first match the full pattern, and then replace all the digits with an X char.

^\d{4}(?:[._ -]\d{4}){3}$

Explanation

  • ^ Start of string
  • \d{4} Match 4 digits
  • (?:[._ -]\d{4}){3} Repeat 3 times matching 1 of the separators and 4 digits
  • $ End of string

See a regex demo.

For example using php

$strings = [
    "1111-1111-1111-1111",
    "4444.4444.4444.4444",
    "7777_7777_7777_7777",
    "2222 2222 2222 2222",
    "2222.2222_2222 2222"
];
$pattern = '/^\d{4}(?:[._ -]\d{4}){3}$/';

foreach($strings as $str) {
    if (preg_match($pattern, $str, $match)) {
        echo preg_replace("/\d/", "X", $str) . PHP_EOL;
    }
}

Output

XXXX-XXXX-XXXX-XXXX
XXXX.XXXX.XXXX.XXXX
XXXX_XXXX_XXXX_XXXX
XXXX XXXX XXXX XXXX
XXXX.XXXX_XXXX XXXX

CodePudding user response:

Am I missing something, or are you overcomplicating things?

This is just using lookaround assertions to make sure that whatever is around the digits is not other digits. Then replacing digits inside with X. We do not need to worry about the separators.

use strict;
use warnings;

while (<DATA>) {
    s/(?<!\d)(\d{4})(?!\d)/XXXX/g;
    print;
}

__DATA__
1111-1111-1111-1111
4444.4444.4444.4444
7777_7777_7777_7777
2222 2222 2222 2222
22 2222 2222 22
1 2 3 4 5555_5555 5647.1234
1 11111 2222 2222 2222

Will output:

XXXX-XXXX-XXXX-XXXX
XXXX.XXXX.XXXX.XXXX
XXXX_XXXX_XXXX_XXXX
XXXX XXXX XXXX XXXX
22 XXXX XXXX 22
1 2 3 4 XXXX_XXXX XXXX.XXXX
1 11111 XXXX XXXX XXXX
  • Related