Home > Mobile >  Regex find groups and matches from a log file
Regex find groups and matches from a log file

Time:10-10

I have a log file with a lot of lines. Example:

219.133.7.154 - price5585 [21/Jun/2019:15:45:53 -0700] "GET /incubate/incubate HTTP/1.1" 201 12126

I need as the output something like this:

{host: 219.133.7.154, user: price5585, date: 21/Jun/2019:15:45:53 -0700, req: GET /incubate/incubate HTTP/1.1}

I really struggle with that and got only 2 first things working. Here is my code:

pattern = """
(?P<host>.*) #Host name
(-\ ) 
(?P<username>\w*)  #username

(?P<time>\w*) #Time

"""

How should pattern look, so I can extract everything I need?

CodePudding user response:

Do you have to use regex? Because your goal can be easily achieved by parsing fields separated by whitespaces:

#!/usr/bin/env python3

LINE = "219.133.7.154 - price5585 [21/Jun/2019:15:45:53 -0700] \"GET /incubate/incubate HTTP/1.1\" 201 12126"

body = LINE.split("\"")[1]
split_line = LINE.split(" ")

output_dict = {"host": split_line[0],
               "user": split_line[2],
               "date": " ".join([split_line[3], split_line[4]]),
               "req": body}
print(output_dict)

Output:

{'host': '219.133.7.154', 'user': 'price5585', 'date': '[21/Jun/2019:15:45:53 -0700]', 'req': 'GET /incubate/incubate HTTP/1.1'}

CodePudding user response:

With the use of regex:

import re
line = '219.133.7.154 - price5585 [21/Jun/2019:15:45:53 -0700] "GET /incubate/incubate HTTP/1.1" 201 12126'
pat = r'((?:\d \.){3}\d ) - (\w ) \[([^\]] )\] \"([^\"] )'
m = re.match(pat, line)
dic = {'host': m.group(1), 'user': m.group(2), \
    'date': m.group(3), 'req': m.group(4)}
print(dic)

{'host': '219.133.7.154', 'user': 'price5585', 'date': '21/Jun/2019:15:45:53 -0700', 'req': 'GET /incubate/incubate HTTP/1.1'}
  • Related