So, I've been tasked with converting a string into a dict (has to be using regex). I've done a findall to separate each element but not sure how to put it together.
I have the following code:
import re
def edata():
with open("employeedata.txt", "r") as file:
employeedata = file.read()
IP_field = re.findall(r"\d [.]\d [.]\d [.]\d ", employeedata)
username_field = re.findall (r"[a-z] \d |- -", employeedata)
date_field = re.findall (r"\d \/[A-Z][a-z][0-9] \/\d\d\d\d:\d :\d :\d -\d ", employeedata)
type_field = re.findall (r'"(.*)?"', employeedata)
Fields = ["IP","username","date","type"]
Fields2 = IP_field, username_field, date_field, type_field
dictionary = dict(zip(Fields,Fields2))
return dictionary
print(edata())
Current output:
{ "IP": ["190.912.120.151", "190.912.120.151"], "username": ["skynet10001", "skynet10001"] etc }
Expected output:
[{ "IP": "190.912.120.151", "username": "skynet10001" etc },
{ "IP": "190.912.120.151", "username": "skynet10001" etc }]
CodePudding user response:
Another solution that uses the dictionary that you have already constructed. This code uses list comprehension and the zip function to produce a list of dictionaries from the existing dictionary
variable.
import re
def edata():
with open("employeedata.txt", "r") as file:
employeedata = file.read()
IP_field = re.findall(r"\d [.]\d [.]\d [.]\d ", employeedata)
username_field = re.findall (r"[a-z] \d |- -", employeedata)
date_field = re.findall (r"\[(.*?)\]", employeedata) ## changed your regex for the date field
type_field = re.findall (r'"(.*)?"', employeedata)
Fields = ["IP","username","date","type"]
Fields2 = IP_field, username_field, date_field, type_field
dictionary = dict(zip(Fields,Fields2))
result_dictionary = [dict(zip(dictionary, i)) for i in zip(*dictionary.values())] ## convert to list of dictionaries
return result_dictionary
print(edata())
CodePudding user response:
You can use
import re
rx = re.compile(r'^(?P<IP>\d (?:\.\d ){3})\s \S \s (?P<Username>[a-z] \d )\s \[(?P<Date>[^][] )]\s "(?P<Type>[^"]*)"')
def edata():
results = []
with open("downloads/employeedata.txt", "r") as file:
for line in file:
match = rx.search(line)
if match:
results.append(match.groupdict())
return results
print(edata())
See the online Python demo. For the file = ['190.912.120.151 - skynet10001 [19/Jan/2012] "Temp"', '221.143.119.260 - terminator002 [16/Feb/2021] "Temp 2"']
input, the output will be:
[{'IP': '190.912.120.151', 'Username': 'skynet10001', 'Date': '19/Jan/2012', 'Type': 'Temp'}, {'IP': '221.143.119.260', 'Username': 'terminator002', 'Date': '16/Feb/2021', 'Type': 'Temp 2'}]
The regex is
^(?P<IP>\d (?:\.\d ){3})\s \S \s (?P<Username>[a-z] \d )\s \[(?P<Date>[^][] )]\s "(?P<Type>[^"]*)"
See the regex demo. Details:
^
- start of string(?P<IP>\d (?:\.\d ){3})
- Group "IP": one or more digits and then three occurrences of a.
and one or more digits\s \S \s
- one or more non-whitespace chars enclosed with one or more whitespace chars on both ends(?P<Username>[a-z] \d )
- Group "Username": one or more lowercase ASCII letters and then one or more digits\s
- one or more whitespaces\[
- a[
char(?P<Date>[^][] )
- Group "Date": one or more chars other than[
and]
]\s "
- a]
char, one or more whitespaces,"
(?P<Type>[^"]*)
- Group "Type": zero or more chars other than"
"
- a"
char.