I have a log file with a lot of lines. Example:
219.133.7.154 - price5585 [21/Jun/2019:15:45:53 -0700] "GET /incubate/incubate HTTP/1.1" 201 12126
I need as the output something like this:
{host: 219.133.7.154, user: price5585, date: 21/Jun/2019:15:45:53 -0700, req: GET /incubate/incubate HTTP/1.1}
I really struggle with that and got only 2 first things working. Here is my code:
pattern = """
(?P<host>.*) #Host name
(-\ )
(?P<username>\w*) #username
(?P<time>\w*) #Time
"""
How should pattern look, so I can extract everything I need?
CodePudding user response:
Do you have to use regex? Because your goal can be easily achieved by parsing fields separated by whitespaces:
#!/usr/bin/env python3
LINE = "219.133.7.154 - price5585 [21/Jun/2019:15:45:53 -0700] \"GET /incubate/incubate HTTP/1.1\" 201 12126"
body = LINE.split("\"")[1]
split_line = LINE.split(" ")
output_dict = {"host": split_line[0],
"user": split_line[2],
"date": " ".join([split_line[3], split_line[4]]),
"req": body}
print(output_dict)
Output:
{'host': '219.133.7.154', 'user': 'price5585', 'date': '[21/Jun/2019:15:45:53 -0700]', 'req': 'GET /incubate/incubate HTTP/1.1'}
CodePudding user response:
With the use of regex:
import re
line = '219.133.7.154 - price5585 [21/Jun/2019:15:45:53 -0700] "GET /incubate/incubate HTTP/1.1" 201 12126'
pat = r'((?:\d \.){3}\d ) - (\w ) \[([^\]] )\] \"([^\"] )'
m = re.match(pat, line)
dic = {'host': m.group(1), 'user': m.group(2), \
'date': m.group(3), 'req': m.group(4)}
print(dic)
{'host': '219.133.7.154', 'user': 'price5585', 'date': '21/Jun/2019:15:45:53 -0700', 'req': 'GET /incubate/incubate HTTP/1.1'}