Home > Mobile >  Pandas regex : Split String column into multiple integer columns
Pandas regex : Split String column into multiple integer columns

Time:11-16

I'm completely new to regex and i'm facing this challenge that is taking me hours to solve. I have the following dataframe with a string column "Dimensions":

df
                 Dimensions
0                           "Width:2 cm"
1      "Diameter: 1.2 cm, Height: 10 cm"
2        "Diameter: 3.4cm, Volume: 10cm"

I'm looking for a way to "split" the dimensions column into multiple columns :

desired_df

             Width      Diameter      Height    Volume
0              2          null         null       null
1             null         1.2          10        null
2             null         3.4         null        10

Note that the columns order don't matter. If you have any hints that could help me I would appreciate it, thanks !

CodePudding user response:

Try:

x = df["Dimensions"].str.extractall(r'([^\s"] )\s*:\s*(\d \.?\d*)').droplevel(1)
x = x.pivot(columns=0, values=1)
x.columns.name = None
print(x)

Prints:

  Diameter Height Volume Width
0      NaN    NaN    NaN     2
1      1.2     10    NaN   NaN
2      3.4    NaN     10   NaN
  • Related