How to use positive lookbehind to find only certain 3 digit numbers?-CodePudding

I have this text where I want to identify only certain three digit numbers using my depart city (NYC) as the positive lookbehind expression. I don't want to include it or anything else in the result, other than the desired three digit number. I can't simply use \d{3,} because there are other three digit numbers in this text I haven't included here which should not be in the output.

"Depart: NYC (etd 9/30), NJ (etd 10/4)
Arrive LAX
Rate: USD500, 700P"

With this code (?<=NYC)(\D|\S)*\d{3,} outputs

" (etd 9/30), NJ (etd 10/4) Arrive LAX Rate: USD500, 700"

but I want it to output "700" only. I also want to write a regex that will only output 500 without using USD as the positive lookbehind expression. Is this possible? I've also tried

(?<=NYC)(?<=(\D|\S)*)\d{3,}

but this doesn't output anything.

CodePudding user response：

You can use use

(?s)NYC.*?\b(\d{3,})

See the regex demo. Details:

(?s) - re.DOTALL inline modifier
NYC - NYC word
.*? - any zero or more chars as few as possible
\b - a word boundary
(\d{3,}) - Group 1: three or more digits.

See the Python demo:

import re
text = """Depart: NYC (etd 9/30), NJ (etd 10/4)
Arrive LAX
Rate: USD500, 700P"""
m = re.search(r'(?s)NYC.*?\b(\d{3,})', text)
if m:
    print(m.group(1))

# => 700

CodePudding user response：

Using (\D|\S) matches any character except a digit, or match any non whitespace char. This will match any character, and can also be written as [\s\S] or you can let the dot match any character using (?s) or with a flag re.DOTALL

To match the first occurrence of 3 digits only without using the USD as positive lookbehind, you can capture 3 digit that are not surrounded by digits:

^[\s\S]*?(?<!\d)(\d{3})(?!\d)

The pattern matches:

^ Start of string
[\s\S]*? Match any char including newlines, as few as possbiel
(?<!\d) Assert not a digit to the left
(\d{3}) Capture 3 digits
(?!\d) Assert not a digit to the right

Regex demo | Python demo

import re
 
pattern = r"^.*?(?<!\d)(\d{3})(?!\d)"
 
s = ("\"Depart: NYC (etd 9/30), NJ (etd 10/4)\n"
    "Arrive LAX\n"
    "Rate: USD500, 700P\"\n")
 
m = re.search(pattern, s, re.DOTALL)
if m:
    print (m.group(1))

To match the 700 after NYC, you can capture 3 digits preceded by a word boundary and assert no following digit

^[\s\S]*?\b(\d{3})(?!\d)

Output

Regex demo | Python demo

import re
 
pattern = r"^.*?\b(\d{3})(?!\d)"
s = ("\"Depart: NYC (etd 9/30), NJ (etd 10/4)\n"
    "Arrive LAX\n"
    "Rate: USD500, 700P\"\n")
 
m = re.search(pattern, s, re.DOTALL)
if m:
    print (m.group(1))

Output