Home > Software design >  Using Python regex to extract basic HTTP request data - lines without users are not formatted correc
Using Python regex to extract basic HTTP request data - lines without users are not formatted correc

Time:04-06

I'm trying to use regex finditer() to extract basic HTTP request data into a list of dictionaries. The raw data is as follows:

logdata2 =

13.112.8.80 - rau5026 [21/Jun/2019:15:46:09 -0700] "HEAD /ubiquitous/transparent HTTP/1.1" 200 16928 159.253.153.40 - - [21/Jun/2019:15:46:10 -0700] "POST /e-business HTTP/1.0" 504 19845 136.195.158.6 - feeney9464 [21/Jun/2019:15:46:11 -0700] "HEAD /open-source/markets HTTP/2.0" 204 21149

Line 2 just has a '-' for the user, so I need to just need a an empty string for the user, like this:

Expected result: {'host': '159.253.153.40', 'user_name': '', 'time': '21/Jun/2019:15:46:10 -0700', 'request': 'POST /e-business HTTP/1.0'}

Lines 1 and 3 work fine with my regex code below. But line 2 has a '2' in the host item. Can anyone tell where I'm going wrong? Thanks!

Actual result: {'host': '159.253.153.40 -', 'user_name': '', 'time': '21/Jun/2019:15:46:10 -0700', 'request': 'POST /e-business HTTP/1.0'}

Thanks!

result2 = \[\]
pattern2="""
(?P\<host\>.*)*
(\\s-\\s?)
(?P\<user_name\>\\w)
(\\s \[)
(?P\<time\>(.*))*
(\])
(\\s ")
(?P\<request\>.)
(")
"""
for item in re.finditer(pattern2, logdata2, re.VERBOSE):
    result2.append(item.groupdict())

CodePudding user response:

Possible solution is the following:

pattern2 = r"(?P<host>.*)\s-\s(?P<user_name>.*)\s\[(?P<time>.*)\]\s\"(?P<request>.*)\""

REGEX DEMO

  • Related