I have a txt file as below. the dataset has the below template, I want to convert this dataset in to 6 columns with Id, Cause, Code, Event Time, Severity and Severity Code headers in python:
Id = 0005 Cause = ERROR
Code = 307 Event Time = 2020-11-09 10:16:48
Severity = WARNING
Severity Code = 5 Id = 0006 Cause = FAILURE
Code = 517 Event Time = 2020-11-09 10:19:47
Severity = MINOR Severity Code = 4
I want to know that is it possible to convert above dataset as below:
Id Cause Code Event Time Severity Severity Code
0005 ERROR 307 2020-11-09 10:16:48 WARNING 5
0006 FAILURE 517 2020-11-09 10:19:47 MINOR 4
CodePudding user response:
Try this:
import re
pattern = re.compile("(. ?)=(. ?)\s{2,}")
data = []
item = {}
with open("data.txt") as fp:
for line in fp:
for m in pattern.finditer(line):
key, value = [m.group(i).strip() for i in [1,2]]
if key == "Id":
if item:
data.append(item)
item = {"Id": value}
else:
item[key] = value
data.append(item)
df = pd.DataFrame(data)
CodePudding user response:
this is a way to convert the above data, I hope it helps!
import re
import pandas as pd
x = """Id = 0005 Cause = ERROR
Code = 307 Event Time = 2020-11-09 10:16:48
Severity = WARNING
Severity Code = 5 Id = 0006 Cause = FAILURE
Code = 517 Event Time = 2020-11-09 10:19:47
Severity = MINOR Severity Code = 4"""
formatted_text = ' '.join(x.split())
id = re.findall(r"Id = ([^\s] )", formatted_text)
cause = re.findall(r"Cause = ([^\s] )", formatted_text)
severity = re.findall(r"Severity = ([^\s] )", formatted_text)
severity_code = re.findall(r"Severity Code = ([^\s] )", formatted_text)
event_time = re.findall(r"Event Time = ([^\s] )", formatted_text)
info_dict = {
"Id": id,
"Cause": cause,
"Severity": severity,
"Severity Code": severity_code,
"Event Time": event_time
}
df = pd.DataFrame.from_dict(info_dict)
print(df)