I have a big series of strings that I want to split into a dataframe.
The series looks like this:
s = pd.Series({"1":"Name=Marc-Age=48-Car=Tesla",
"2":"Name=Ben-Job=Pilot-Car=Porsche",
"3":"Name=Tom-Age=24-Car=Ford"})
I want to split this into a dataframe looking like this:
Name Age Job Car
1 Marc 48 Nan Tesla
2 Ben Nan Pilot Porsche
3 Tom 24 Nan Ford
I tried to split the strings first by "-" and then by "=" but don't understand how to continue after
df=s.str.split("-", expand=True)
for col in df.columns:
df[col]=df[col].str.split("=")
I get this:
` 0 1 2`
`1 ['Name', 'Marc'] ['Age', '48'] ['Car', 'Tesla']`
`2 ['Name', 'Ben'] ['Job', 'Pilot'] ['Car', 'Porsche']`
`3 ['Name', 'Tom'] ['Age', '24'] ['Car', 'Ford']`
I don't know how to continue from here. I can't loop through the rows because my dataset is really big.
Can anyone help on how to go on from here?
CodePudding user response:
If you split then explode and split again you can then use a pivot.
import pandas as pd
s = pd.Series({"1":"Name=Marc-Age=48-Car=Tesla",
"2":"Name=Ben-Job=Pilot-Car=Porsche",
"3":"Name=Tom-Age=24-Car=Ford"})
s = s.str.split('-').explode().str.split('=', expand=True).reset_index()
s = s.pivot(index='index', columns=0, values=1).reset_index(drop=True)
Output
Age Car Job Name
0 48 Tesla NaN Marc
1 NaN Porsche Pilot Ben
2 24 Ford NaN Tom