I am trying to locate all phone numbers across various files, including JSON and TXT.
Matching should be done based on whether there are 10 or 11 numeric characters (0-012-345-6789) or (012-345-6789), NOT more and NOT less. The phone numbers are often surrounded by text, but sometimes by spaces and tabs (see below examples). The phone numbers sometimes also include hyphens "-" and parentheses "()" to delineate the numbers.
abc0123456789def <- match
abc10123456789def <- match
abc10123456789def <- match
abc101234567899def <- no match (12 numbers)
abc101234567def <- no match (9 numbers)
abc 0123456789 def <- match
abc 10123456789 def <- match
abc1(012)345-6789def <- match
abc1-012-345-6789def <- match
abc(012)345-6789def <- match
abc012-345-6789def <- match
abc 1(012)345-6789 def <- match
Your help is super appreciated!
CodePudding user response:
If I recall grep correctly then:
grep -iP "(?:^|(?<=\D))\d?(?:\(\d{3}\)|-?\d{3})-?\d{3}-?\d{4}(?=\D|$)"
(?:^|(?<=\D))
- behind me is the start of the line or a non-digit char\d?
- optional leading digit(?:
- start non-capturing group\(\d{3}\)
- format equivalent to(555)
|
- or-?\d{3}
- format equivalent to-555
with the hyphen being optional
)
- end non-capturing group-?\d{3}-?\d{4}
- format equivalent to-555-5555
with optional hyphens(?=\D|$)
- ahead of me is a non-digit char or the end of a line
Here it is in PHP https://regex101.com/r/Gdeiq7/1