I'm completely new to regex and i'm facing this challenge that is taking me hours to solve. I have the following dataframe with a string column "Dimensions":
df
Dimensions
0 "Width:2 cm"
1 "Diameter: 1.2 cm, Height: 10 cm"
2 "Diameter: 3.4cm, Volume: 10cm"
I'm looking for a way to "split" the dimensions column into multiple columns :
desired_df
Width Diameter Height Volume
0 2 null null null
1 null 1.2 10 null
2 null 3.4 null 10
Note that the columns order don't matter. If you have any hints that could help me I would appreciate it, thanks !
CodePudding user response:
Try:
x = df["Dimensions"].str.extractall(r'([^\s"] )\s*:\s*(\d \.?\d*)').droplevel(1)
x = x.pivot(columns=0, values=1)
x.columns.name = None
print(x)
Prints:
Diameter Height Volume Width
0 NaN NaN NaN 2
1 1.2 10 NaN NaN
2 3.4 NaN 10 NaN