Home > Mobile >  Splitting a column with more than one kind of separators
Splitting a column with more than one kind of separators

Time:12-16

I'm working with a dataset which one of the columns I need to split into three other columns. This column is like this:

df2['Measurements'].value_counts(normalize=True)
6.9x6.87x4.16     0.000834
5.42x5.39x3.36    0.000834
6.83x6.86x4.23    0.000834
5.43x5.4x3.38     0.000834
6.11x6.16x3.85    0.000834
                    ...   
5.05x5.07x3.01    0.000278
6.46x6.43x3.94    0.000278
7.7x7.73x4.78     0.000278
8.37x8.41x5.19    0.000278
6.3x6.26x4.1      0.000278
Name: Measurements, Length: 3450, dtype: float64

Than I tried to split it in the 'X' like the code I used below:

df2[['Length-mm','Width-mm','Depth-mm']] = df2['Measurements'].str.split('x', n=2, expand=True)

But when I checked the value counts from the new column it showed me that there is other separator besides the "x".

df2['Length-mm'].value_counts(normalize=True)
6.44              0.008058
6.8               0.006946
6.47              0.006946
6.43              0.006669
6.48              0.006669
                    ...   
5.57*5.55*3.58    0.000278
7.85*7.89*4.78    0.000278
8.98              0.000278
5.54*5.51         0.000278
4.62              0.000278
Name: Length-mm, Length: 722, dtype: float64

See like some rows have three values separated by "*", so I'd like to create these new columns and know if it is possible and if, how to split using more than one separator.

CodePudding user response:

You can use regular expressions with str.split. Your line should be:

df2[['Length-mm','Width-mm','Depth-mm']] = df2['Measurements'].str.split(r'x|\*', n=2, expand=True)

Note that you can concatenate as many separators you want with | (which is an OR in regex syntax). The * has to be preceded by a \ because it is a special character in regex syntax.

  • Related