Home > Software design >  extract data from file through regex
extract data from file through regex

Time:11-14

146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web services HTTP/2.0" 203 26554
156.127.178.177 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701
100.32.205.59 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048

All I want is to convert the above data into a list of dictionaries, where each dictionary looks like the following:

example_dict = {"host":"146.204.224.152", 
                "user_name":"feest6811", 
                "time":"21/Jun/2019:15:45:24 -0700",
                "request":"POST /incentivize HTTP/1.1"}

kindly help me i am new!!

CodePudding user response:

You could use

^
(?P<host>\d \S )[-\s] 
(?P<user_name>\S )\s 
\[(?P<time>[^][] )\]\s 
"(?P<request>[^"] )"

See a demo on regex101.com.


In Python this could be

import re

pattern = re.compile(r"""
    ^
    (?P<host>\d \S )[-\s] 
    (?P<user_name>\S )\s 
    \[(?P<time>[^][] )\]\s 
    "(?P<request>[^"] )"
""", re.MULTILINE | re.VERBOSE)

data = """
146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web services HTTP/2.0" 203 26554
156.127.178.177 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701
100.32.205.59 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048
"""

for match in pattern.finditer(data):
    dct = match.groupdict()
    print(dct)

And would yield

{'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}
{'host': '197.109.77.178', 'user_name': 'kertzmann3129', 'time': '21/Jun/2019:15:45:25 -0700', 'request': 'DELETE /virtual/solutions/target/web services HTTP/2.0'}
{'host': '156.127.178.177', 'user_name': 'okuneva5222', 'time': '21/Jun/2019:15:45:27 -0700', 'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1'}
{'host': '100.32.205.59', 'user_name': 'ortiz8891', 'time': '21/Jun/2019:15:45:28 -0700', 'request': 'PATCH /architectures HTTP/1.0'}

CodePudding user response:

in this code i'm using re to search patterns, then gathering matches in the dictionary unit_d. List fulllist contains all dictionaries.

import re
filename='c:/test/log.txt'
fulllist=[]
with open(filename) as file:
    for line in file:
        unit_d=dict()
        text=line.rstrip()
        finder=re.search('([\d\.] )[\s-] (\w ) \[([\w/: -] )\] "([^"] )',text)
        unit_d['host']=finder.group(1)
        unit_d['user_name']=finder.group(2)
        unit_d['time']=finder.group(3)
        unit_d['request']=finder.group(4)
        print unit_d
        fulllist.append(unit_d)

results

{'request': 'POST /incentivize HTTP/1.1', 'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700'}
{'request': 'DELETE /virtual/solutions/target/web services HTTP/2.0', 'host': '197.109.77.178', 'user_name': 'kertzmann3129', 'time': '21/Jun/2019:15:45:25 -0700'}
{'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1', 'host': '156.127.178.177', 'user_name': 'okuneva5222', 'time': '21/Jun/2019:15:45:27 -0700'}
{'request': 'PATCH /architectures HTTP/1.0', 'host': '100.32.205.59', 'user_name': 'ortiz8891', 'time': '21/Jun/2019:15:45:28 -0700'}
  • Related