Home > OS >  Using Regex to making a dictionary in Python- not sure how to put the code together
Using Regex to making a dictionary in Python- not sure how to put the code together

Time:12-13

So, I've been tasked with converting a string into a dict (has to be using regex). I've done a findall to separate each element but not sure how to put it together.

I have the following code:

import re

def edata():

with open("downloads/employeedata.txt", "r") as file:

edata = file.read()

IP_field = return re.findall(r"\d [.]\d [.]\d [.]\d ", employeedata)

username_field = re.findall (r"[a-z] \d |- -", employeedata)

date_field = re.findall (r"\d \/[A-Z][a-z] \/\d\d\d\d:\d :\d :\d  -\d ", employeedata)

type_ field = return re.findall (r'"(.*)?"', employeedata)

Fields = ["IP","username","date","type"]

Fields2 = IP_field, username_field, date_field, type_field

dictionary = dict(zip(Fields,Fields2 ))

print(edata())

Not sure why it's not working.

CodePudding user response:

Another solution that uses the dictionary that you have already constructed. This code uses list comprehension and the zip function to produce a list of dictionaries from the existing dictionary variable.

import re

def edata():
  with open("employeedata.txt", "r") as file:
    employeedata = file.read()
    IP_field = re.findall(r"\d [.]\d [.]\d [.]\d ", employeedata)
    username_field = re.findall (r"[a-z] \d |- -", employeedata)

    date_field = re.findall (r"\[(.*?)\]", employeedata) ## changed your regex for the date field

    type_field = re.findall (r'"(.*)?"', employeedata)
    Fields = ["IP","username","date","type"]
    Fields2 = IP_field, username_field, date_field, type_field
    dictionary = dict(zip(Fields,Fields2))

    result_dictionary = [dict(zip(dictionary, i)) for i in zip(*dictionary.values())] ## convert to list of dictionaries
    return result_dictionary


print(edata())

CodePudding user response:

You can use

import re

rx = re.compile(r'^(?P<IP>\d (?:\.\d ){3})\s \S \s (?P<Username>[a-z] \d )\s \[(?P<Date>[^][] )]\s "(?P<Type>[^"]*)"')

def edata():
    results = []
    with open("downloads/employeedata.txt", "r") as file:
        for line in file:
            match = rx.search(line)
            if match:
                results.append(match.groupdict())
    return results
    
print(edata())

See the online Python demo. For the file = ['190.912.120.151 - skynet10001 [19/Jan/2012] "Temp"', '221.143.119.260 - terminator002 [16/Feb/2021] "Temp 2"'] input, the output will be:

[{'IP': '190.912.120.151', 'Username': 'skynet10001', 'Date': '19/Jan/2012', 'Type': 'Temp'}, {'IP': '221.143.119.260', 'Username': 'terminator002', 'Date': '16/Feb/2021', 'Type': 'Temp 2'}]

The regex is

^(?P<IP>\d (?:\.\d ){3})\s \S \s (?P<Username>[a-z] \d )\s \[(?P<Date>[^][] )]\s "(?P<Type>[^"]*)"

See the regex demo. Details:

  • ^ - start of string
  • (?P<IP>\d (?:\.\d ){3}) - Group "IP": one or more digits and then three occurrences of a . and one or more digits
  • \s \S \s - one or more non-whitespace chars enclosed with one or more whitespace chars on both ends
  • (?P<Username>[a-z] \d ) - Group "Username": one or more lowercase ASCII letters and then one or more digits
  • \s - one or more whitespaces
  • \[ - a [ char
  • (?P<Date>[^][] ) - Group "Date": one or more chars other than [ and ]
  • ]\s " - a ] char, one or more whitespaces, "
  • (?P<Type>[^"]*) - Group "Type": zero or more chars other than "
  • " - a " char.
  • Related