Home > OS >  Going in reverse with RegEx
Going in reverse with RegEx

Time:08-18

I'm writing a Python script and I need to extract two pieces of information from the following text:

The user XXXXXXXX ([email protected]) was involved in an impossible travel incident. The user connected from two countries within 102 minutes, from these IP addresses: Country1 (111.111.111.111) and Country2 (222.222.222.222). Another irrelevant staff...

I need "Country1" and "Country2". I already extracted the IPs so I can look for them in my expression.

With this regex: (?> )(.*)(?= \(111\.111\.111\.111)

I take all this:

The user XXXXXXXX ([email protected]) was involved in an impossible travel incident. The user connected from two countries within 102 minutes, from these IP addresses: Country1

Is there a way to take all the characters going backward and make it stop at the first space, to take just "Country1" ?

Or does anyone knows a better way to extract "Country1" and "Country2" with a regex or directly with Python?

CodePudding user response:

If your message pattern is always the same you can get the countries like this using Python:

your_string = 'The user XXXXXXXX ([email protected]) ...'
your_string = your_string.split(': ')[1].split(' and ')
first_country = your_string[0].split(' (')[0]
second_country = your_string[1].split(' (')[0]

CodePudding user response:

You can use

\S (?=\s*\(\d{1,3}(?:\.\d{1,3}){3}\))

See the regex demo.

Details:

  • \S - one or more non-whitespace chars
  • (?=\s*\(\d{1,3}(?:\.\d{1,3}){3}\)) - a positive lookahead that requires the following pattern to appear immediately at the right of the current location:
    • \s* - zero or more whitespaces
    • \( - a ( char
    • \d{1,3}(?:\.\d{1,3}){3} - one to three digits and then three repetitions of . and one to three digits
    • \) - a ) char.
  • Related