Splitting a column with more than one kind of separators-CodePudding

I'm working with a dataset which one of the columns I need to split into three other columns. This column is like this:

df2['Measurements'].value_counts(normalize=True)

6.9x6.87x4.16     0.000834
5.42x5.39x3.36    0.000834
6.83x6.86x4.23    0.000834
5.43x5.4x3.38     0.000834
6.11x6.16x3.85    0.000834
                    ...   
5.05x5.07x3.01    0.000278
6.46x6.43x3.94    0.000278
7.7x7.73x4.78     0.000278
8.37x8.41x5.19    0.000278
6.3x6.26x4.1      0.000278
Name: Measurements, Length: 3450, dtype: float64

Than I tried to split it in the 'X' like the code I used below:

df2[['Length-mm','Width-mm','Depth-mm']] = df2['Measurements'].str.split('x', n=2, expand=True)

But when I checked the value counts from the new column it showed me that there is other separator besides the "x".

df2['Length-mm'].value_counts(normalize=True)

6.44              0.008058
6.8               0.006946
6.47              0.006946
6.43              0.006669
6.48              0.006669
                    ...   
5.57*5.55*3.58    0.000278
7.85*7.89*4.78    0.000278
8.98              0.000278
5.54*5.51         0.000278
4.62              0.000278
Name: Length-mm, Length: 722, dtype: float64

See like some rows have three values separated by "*", so I'd like to create these new columns and know if it is possible and if, how to split using more than one separator.

CodePudding user response：

You can use regular expressions with str.split. Your line should be:

df2[['Length-mm','Width-mm','Depth-mm']] = df2['Measurements'].str.split(r'x|\*', n=2, expand=True)

Note that you can concatenate as many separators you want with | (which is an OR in regex syntax). The * has to be preceded by a \ because it is a special character in regex syntax.