Home > Software design >  Converting dictionary into a list of dictionaries
Converting dictionary into a list of dictionaries

Time:12-14

So, I've been tasked with converting a string into a dict (has to be using regex). I've done a findall to separate each element but not sure how to put it together.

I have the following code:

import re

def edata():
  with open("employeedata.txt", "r") as file:
    employeedata = file.read()
    IP_field = re.findall(r"\d [.]\d [.]\d [.]\d ", employeedata)
    username_field = re.findall (r"[a-z] \d |- -", employeedata)
    date_field = re.findall (r"\d \/[A-Z][a-z][0-9] \/\d\d\d\d:\d :\d :\d  -\d ", employeedata)
    type_field = re.findall (r'"(.*)?"', employeedata)
    Fields = ["IP","username","date","type"]
    Fields2 = IP_field, username_field, date_field, type_field
    dictionary = dict(zip(Fields,Fields2))
    return dictionary

print(edata())

Current output:

{ "IP": ["190.912.120.151", "190.912.120.151"], "username": ["skynet10001", "skynet10001"] etc }

Expected output:

[{ "IP": "190.912.120.151", "username": "skynet10001" etc },
{ "IP": "190.912.120.151", "username": "skynet10001" etc }]

CodePudding user response:

Another solution that uses the dictionary that you have already constructed. This code uses list comprehension and the zip function to produce a list of dictionaries from the existing dictionary variable.

import re

def edata():
  with open("employeedata.txt", "r") as file:
    employeedata = file.read()
    IP_field = re.findall(r"\d [.]\d [.]\d [.]\d ", employeedata)
    username_field = re.findall (r"[a-z] \d |- -", employeedata)

    date_field = re.findall (r"\[(.*?)\]", employeedata) ## changed your regex for the date field

    type_field = re.findall (r'"(.*)?"', employeedata)
    Fields = ["IP","username","date","type"]
    Fields2 = IP_field, username_field, date_field, type_field
    dictionary = dict(zip(Fields,Fields2))

    result_dictionary = [dict(zip(dictionary, i)) for i in zip(*dictionary.values())] ## convert to list of dictionaries
    return result_dictionary


print(edata())

CodePudding user response:

You can use

import re

rx = re.compile(r'^(?P<IP>\d (?:\.\d ){3})\s \S \s (?P<Username>[a-z] \d )\s \[(?P<Date>[^][] )]\s "(?P<Type>[^"]*)"')

def edata():
    results = []
    with open("downloads/employeedata.txt", "r") as file:
        for line in file:
            match = rx.search(line)
            if match:
                results.append(match.groupdict())
    return results
    
print(edata())

See the online Python demo. For the file = ['190.912.120.151 - skynet10001 [19/Jan/2012] "Temp"', '221.143.119.260 - terminator002 [16/Feb/2021] "Temp 2"'] input, the output will be:

[{'IP': '190.912.120.151', 'Username': 'skynet10001', 'Date': '19/Jan/2012', 'Type': 'Temp'}, {'IP': '221.143.119.260', 'Username': 'terminator002', 'Date': '16/Feb/2021', 'Type': 'Temp 2'}]

The regex is

^(?P<IP>\d (?:\.\d ){3})\s \S \s (?P<Username>[a-z] \d )\s \[(?P<Date>[^][] )]\s "(?P<Type>[^"]*)"

See the regex demo. Details:

  • ^ - start of string
  • (?P<IP>\d (?:\.\d ){3}) - Group "IP": one or more digits and then three occurrences of a . and one or more digits
  • \s \S \s - one or more non-whitespace chars enclosed with one or more whitespace chars on both ends
  • (?P<Username>[a-z] \d ) - Group "Username": one or more lowercase ASCII letters and then one or more digits
  • \s - one or more whitespaces
  • \[ - a [ char
  • (?P<Date>[^][] ) - Group "Date": one or more chars other than [ and ]
  • ]\s " - a ] char, one or more whitespaces, "
  • (?P<Type>[^"]*) - Group "Type": zero or more chars other than "
  • " - a " char.
  • Related