Home > front end >  Convert One Single row to Multiple Columns based on Categorization in Python
Convert One Single row to Multiple Columns based on Categorization in Python

Time:05-08

I have a txt file as below. the dataset has the below template, I want to convert this dataset in to 6 columns with Id, Cause, Code, Event Time, Severity and Severity Code headers in python:

  Id                = 0005      Cause          = ERROR      
  Code     = 307      Event Time              = 2020-11-09 10:16:48      
  Severity      = WARNING      
  Severity Code = 5      Id                = 0006      Cause          = FAILURE      
  Code     = 517      Event Time              = 2020-11-09 10:19:47      
  Severity      = MINOR      Severity Code = 4    

I want to know that is it possible to convert above dataset as below:

Id          Cause       Code     Event Time             Severity        Severity Code
0005        ERROR       307     2020-11-09 10:16:48     WARNING         5
0006        FAILURE     517     2020-11-09 10:19:47     MINOR           4

CodePudding user response:

Try this:

import re

pattern = re.compile("(. ?)=(. ?)\s{2,}")
data = []
item = {}

with open("data.txt") as fp:
    for line in fp:
        for m in pattern.finditer(line):
            key, value = [m.group(i).strip() for i in [1,2]]
            
            if key == "Id":
                if item:
                    data.append(item)
                item = {"Id": value}
            else:
                item[key] = value

    data.append(item)

df = pd.DataFrame(data)

CodePudding user response:

this is a way to convert the above data, I hope it helps!

import re
import pandas as pd

x =   """Id                = 0005      Cause          = ERROR      
  Code     = 307      Event Time              = 2020-11-09 10:16:48      
  Severity      = WARNING      
  Severity Code = 5      Id                = 0006      Cause          = FAILURE      
  Code     = 517      Event Time              = 2020-11-09 10:19:47      
  Severity      = MINOR      Severity Code = 4"""

formatted_text = ' '.join(x.split())
id = re.findall(r"Id = ([^\s] )", formatted_text)
cause = re.findall(r"Cause = ([^\s] )", formatted_text)
severity = re.findall(r"Severity = ([^\s] )", formatted_text)
severity_code = re.findall(r"Severity Code = ([^\s] )", formatted_text)
event_time = re.findall(r"Event Time = ([^\s] )", formatted_text)

info_dict = {
    "Id": id,
    "Cause": cause,
    "Severity": severity,
    "Severity Code": severity_code,
    "Event Time": event_time
}

df = pd.DataFrame.from_dict(info_dict)
print(df)
  • Related