Home > database >  Splitting series of Strings into Dataframe
Splitting series of Strings into Dataframe

Time:10-20

I have a big series of strings that I want to split into a dataframe.

The series looks like this:

s = pd.Series({"1":"Name=Marc-Age=48-Car=Tesla", 
    "2":"Name=Ben-Job=Pilot-Car=Porsche", 
    "3":"Name=Tom-Age=24-Car=Ford"})  

I want to split this into a dataframe looking like this:

  Name  Age  Job   Car
1 Marc  48   Nan   Tesla
2 Ben   Nan  Pilot Porsche
3 Tom   24   Nan   Ford

I tried to split the strings first by "-" and then by "=" but don't understand how to continue after

df=s.str.split("-", expand=True)
for col in df.columns:
    df[col]=df[col].str.split("=")

I get this:

    `      0    1   2`
    `1  ['Name', 'Marc']    ['Age', '48']   ['Car', 'Tesla']`
    `2  ['Name', 'Ben'] ['Job', 'Pilot']    ['Car', 'Porsche']`
    `3  ['Name', 'Tom'] ['Age', '24']   ['Car', 'Ford']`

I don't know how to continue from here. I can't loop through the rows because my dataset is really big.

Can anyone help on how to go on from here?

CodePudding user response:

If you split then explode and split again you can then use a pivot.

import pandas as pd
s = pd.Series({"1":"Name=Marc-Age=48-Car=Tesla", 
    "2":"Name=Ben-Job=Pilot-Car=Porsche", 
    "3":"Name=Tom-Age=24-Car=Ford"})  

s = s.str.split('-').explode().str.split('=', expand=True).reset_index()
s = s.pivot(index='index', columns=0, values=1).reset_index(drop=True)

Output

   Age      Car    Job  Name
0   48    Tesla    NaN  Marc
1  NaN  Porsche  Pilot   Ben
2   24     Ford    NaN   Tom
  • Related