Using Regex to making a dictionary in Python- not sure how to put the code together-CodePudding

So, I've been tasked with converting a string into a dict (has to be using regex). I've done a findall to separate each element but not sure how to put it together.

I have the following code:

import re

def edata():

with open("downloads/employeedata.txt", "r") as file:

edata = file.read()

IP_field = return re.findall(r"\d [.]\d [.]\d [.]\d ", employeedata)

username_field = re.findall (r"[a-z] \d |- -", employeedata)

date_field = re.findall (r"\d \/[A-Z][a-z] \/\d\d\d\d:\d :\d :\d  -\d ", employeedata)

type_ field = return re.findall (r'"(.*)?"', employeedata)

Fields = ["IP","username","date","type"]

Fields2 = IP_field, username_field, date_field, type_field

dictionary = dict(zip(Fields,Fields2 ))

print(edata())

Not sure why it's not working.

CodePudding user response：

Another solution that uses the dictionary that you have already constructed. This code uses list comprehension and the zip function to produce a list of dictionaries from the existing dictionary variable.

import re

def edata():
  with open("employeedata.txt", "r") as file:
    employeedata = file.read()
    IP_field = re.findall(r"\d [.]\d [.]\d [.]\d ", employeedata)
    username_field = re.findall (r"[a-z] \d |- -", employeedata)

    date_field = re.findall (r"\[(.*?)\]", employeedata) ## changed your regex for the date field

    type_field = re.findall (r'"(.*)?"', employeedata)
    Fields = ["IP","username","date","type"]
    Fields2 = IP_field, username_field, date_field, type_field
    dictionary = dict(zip(Fields,Fields2))

    result_dictionary = [dict(zip(dictionary, i)) for i in zip(*dictionary.values())] ## convert to list of dictionaries
    return result_dictionary


print(edata())

CodePudding user response：

You can use

import re

rx = re.compile(r'^(?P<IP>\d (?:\.\d ){3})\s \S \s (?P<Username>[a-z] \d )\s \[(?P<Date>[^][] )]\s "(?P<Type>[^"]*)"')

def edata():
    results = []
    with open("downloads/employeedata.txt", "r") as file:
        for line in file:
            match = rx.search(line)
            if match:
                results.append(match.groupdict())
    return results
    
print(edata())

See the online Python demo. For the file = ['190.912.120.151 - skynet10001 [19/Jan/2012] "Temp"', '221.143.119.260 - terminator002 [16/Feb/2021] "Temp 2"'] input, the output will be:

[{'IP': '190.912.120.151', 'Username': 'skynet10001', 'Date': '19/Jan/2012', 'Type': 'Temp'}, {'IP': '221.143.119.260', 'Username': 'terminator002', 'Date': '16/Feb/2021', 'Type': 'Temp 2'}]

The regex is

^(?P<IP>\d (?:\.\d ){3})\s \S \s (?P<Username>[a-z] \d )\s \[(?P<Date>[^][] )]\s "(?P<Type>[^"]*)"

See the regex demo. Details:

^ - start of string
(?P<IP>\d (?:\.\d ){3}) - Group "IP": one or more digits and then three occurrences of a . and one or more digits
\s \S \s - one or more non-whitespace chars enclosed with one or more whitespace chars on both ends
(?P<Username>[a-z] \d ) - Group "Username": one or more lowercase ASCII letters and then one or more digits
\s - one or more whitespaces
\[ - a [ char
(?P<Date>[^][] ) - Group "Date": one or more chars other than [ and ]
]\s " - a ] char, one or more whitespaces, "
(?P<Type>[^"]*) - Group "Type": zero or more chars other than "
" - a " char.