I am trying to access the number between two underscores. For example in the below text,
https://http-google-ghh.vault.com__929091__2.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0
I need to get only the numbers 929091, 929092 etc.
I tried '_(.*)_'
but I get the underscores too. I just need the number
CodePudding user response:
Use
re.findall(r'__([0-9] )__', s)
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
__ '__'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[0-9] any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
__ '__'
import re
s = r"""https://http-google-ghh.vault.com__929091__2.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0
https://http-google-ghh.vault.com__929090__1.0
https://http-google-ghh.vault.com__929092__2.0
https://http-google-ghh.vault.com__1205024__1.0"""
print(re.findall(r'__([0-9] )__', s))
Results: ['929091', '929092', '929090', '929092', '1205024', '929090', '929092', '1205024']