I'm writing a Python script and I need to extract two pieces of information from the following text:
The user XXXXXXXX ([email protected]) was involved in an impossible travel incident. The user connected from two countries within 102 minutes, from these IP addresses: Country1 (111.111.111.111) and Country2 (222.222.222.222). Another irrelevant staff...
I need "Country1" and "Country2". I already extracted the IPs so I can look for them in my expression.
With this regex: (?> )(.*)(?= \(111\.111\.111\.111)
I take all this:
The user XXXXXXXX ([email protected]) was involved in an impossible travel incident. The user connected from two countries within 102 minutes, from these IP addresses: Country1
Is there a way to take all the characters going backward and make it stop at the first space, to take just "Country1" ?
Or does anyone knows a better way to extract "Country1" and "Country2" with a regex or directly with Python?
CodePudding user response:
If your message pattern is always the same you can get the countries like this using Python:
your_string = 'The user XXXXXXXX ([email protected]) ...'
your_string = your_string.split(': ')[1].split(' and ')
first_country = your_string[0].split(' (')[0]
second_country = your_string[1].split(' (')[0]
CodePudding user response:
You can use
\S (?=\s*\(\d{1,3}(?:\.\d{1,3}){3}\))
See the regex demo.
Details:
\S
- one or more non-whitespace chars(?=\s*\(\d{1,3}(?:\.\d{1,3}){3}\))
- a positive lookahead that requires the following pattern to appear immediately at the right of the current location:\s*
- zero or more whitespaces\(
- a(
char\d{1,3}(?:\.\d{1,3}){3}
- one to three digits and then three repetitions of.
and one to three digits\)
- a)
char.