Home > Software design >  Regex to access digits between two underscores
Regex to access digits between two underscores

Time:09-16

I am trying to access the number between two underscores. For example in the below text,

https://http-google-ghh.vault.com__929091__2.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0

I need to get only the numbers 929091, 929092 etc.

I tried '_(.*)_' but I get the underscores too. I just need the number

CodePudding user response:

Use

re.findall(r'__([0-9] )__', s)

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  __                       '__'
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [0-9]                    any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  __                       '__'

Python code:

import re
s = r"""https://http-google-ghh.vault.com__929091__2.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0"""
print(re.findall(r'__([0-9] )__', s))

Results: ['929091', '929092', '929090', '929092', '1205024', '929090', '929092', '1205024']

  • Related