Home > Mobile >  Regex to find certain phone numbers in a damaged file
Regex to find certain phone numbers in a damaged file

Time:01-03

I have the following task:

Use grep with the -Pao options and a regular expression to extract all phone numbers from the broken file (solution: 13 phone numbers). The regular expression should match as closely as possible the following formats of phone numbers and be as short as possible:

enter image description here

I tried to work with the respective beginning of the numbers, to then put them together and keep moving forward.

I now have the following code:

grep -Pao '(\ \d{2}.) | (\d{3,4}) | (\d\s\d{2})' kaputt.txt

(the mode is PCRE)

Unfortunately, the code does not return the desired results, as it seems that search conditions are mutually exclusive. I would therefore be grateful for help here.

CodePudding user response:

Are there blanks on both sides of the pipes? If yes, the first case actually is ( \d{2}.)\s which doesn't match any of the formats.

https://regex101.com/r/qDmGIC/1 - but it will also match come unwanted combinations like 111 (1)11 11

CodePudding user response:

It would be a fool's errand to try and find the absolute shortest regex possible. The following should be fine as no format seems to be an extension of another.

grep -Pao "(?:\ \d\d \d\d \d{7}|\ \d\d (\d\d) \d{5} \- \d\d|\ \d\d (\d)\d\d \d{5}\-\d\d|\ \d\d-\d\d\-\d{7}|\ \d\d \d\d \d{5}\-\d\d|\d{4} \d \d{6}|\d \d\d \/ \d\d \d\d \d\d|\d{8}\-\d\d)" kaputt.txt

It is just the text extracted from your image (!) of the required formats, with x replaced by \d, - replaced by \-, replaced by \ , and with each format alternative separated by |.

If you want to match across lines then the -z flag is required and each space could be replaced with, for example, \s .

  • Related