Home > OS >  Python RegEx: Extract timestamp between Square brackets
Python RegEx: Extract timestamp between Square brackets

Time:08-18

I have a source data which is given below:-

14.284.2.1572 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
187.109.797.1798 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web services HTTP/2.0" 203 26554
16.197.978.107 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701
190.392.905.549 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048

I wanted to extract data between square brackets example 21/Jun/2019:15:45:24 -0700 .

I am written a regex code but it looks not optimum, can we have a better way to achieve desired result.

re.findall(r"([0-9]{2}/[A-Za-z]{3}/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}\s-[0-9]{4})", data)

I have also tried with ?<= and ?=, but problem is special characters in data. Any suggestion or input will be appreciated.

CodePudding user response:

I would simplify your regex pattern and just match a leading IP address, followed by dash, a username, and then a timestamp term inside square brackets.

inp = """14.284.2.1572 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
187.109.797.1798 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web services HTTP/2.0" 203 26554
16.197.978.107 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701
190.392.905.549 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048"""

timestamps = re.findall(r'^\d (?:\.\d ){3} - \w  \[(.*?)\]', inp, flags=re.M)
print(timestamps)

This prints:

[
    '21/Jun/2019:15:45:24 -0700',
    '21/Jun/2019:15:45:25 -0700',
    '21/Jun/2019:15:45:27 -0700',
    '21/Jun/2019:15:45:28 -0700'
]

CodePudding user response:

This might be what you've been looking for: re.findall(r"(?<=\[).*?(?=\])", data); returns ['21/Jun/2019:15:45:24 -0700'] for your first line.

Another option would be to try using .split() like data.split('[')[1].split(']')[0]

  • Related