I'm working with a dataset which one of the columns I need to split into three other columns. This column is like this:
df2['Measurements'].value_counts(normalize=True)
6.9x6.87x4.16 0.000834
5.42x5.39x3.36 0.000834
6.83x6.86x4.23 0.000834
5.43x5.4x3.38 0.000834
6.11x6.16x3.85 0.000834
...
5.05x5.07x3.01 0.000278
6.46x6.43x3.94 0.000278
7.7x7.73x4.78 0.000278
8.37x8.41x5.19 0.000278
6.3x6.26x4.1 0.000278
Name: Measurements, Length: 3450, dtype: float64
Than I tried to split it in the 'X' like the code I used below:
df2[['Length-mm','Width-mm','Depth-mm']] = df2['Measurements'].str.split('x', n=2, expand=True)
But when I checked the value counts from the new column it showed me that there is other separator besides the "x".
df2['Length-mm'].value_counts(normalize=True)
6.44 0.008058
6.8 0.006946
6.47 0.006946
6.43 0.006669
6.48 0.006669
...
5.57*5.55*3.58 0.000278
7.85*7.89*4.78 0.000278
8.98 0.000278
5.54*5.51 0.000278
4.62 0.000278
Name: Length-mm, Length: 722, dtype: float64
See like some rows have three values separated by "*", so I'd like to create these new columns and know if it is possible and if, how to split using more than one separator.
CodePudding user response:
You can use regular expressions with str.split
. Your line should be:
df2[['Length-mm','Width-mm','Depth-mm']] = df2['Measurements'].str.split(r'x|\*', n=2, expand=True)
Note that you can concatenate as many separators you want with |
(which is an OR in regex syntax). The *
has to be preceded by a \
because it is a special character in regex syntax.