Rename items from a column in pandas-CodePudding

I'm working in a dataset which I faced the following situation:

df2['Shape'].value_counts(normalize=True)

Round       0.574907
Princess    0.093665
Oval        0.082609
Emerald     0.068820
Radiant     0.059752
Pear        0.041739
Marquise    0.029938
Asscher     0.024099
Cushion     0.010807
Marwuise    0.005342
Uncut       0.004720
Marquis     0.003602
Name: Shape, dtype: float64

and my goal is to make the variables 'Marquis' and 'Marwise' be included into the variable 'Marquise'. How can I combine they?

CodePudding user response：

Since you didn't state any restrictions, a quick fix will be that you can first change the entries the way you desire as shown below-

df2['Shape'][df2['Shape'] == 'Marquis'] = 'Marquise'
df2['Shape'][df2['Shape'] == 'Marwise'] = 'Marquise'

Now, run this command,

df2['Shape'].value_counts(normalize=True)

CodePudding user response：

>>> s = ['a', 'A', 'a', 'b', 'c', 'A', 'd', 'A']
>>> s = pd.Series(s).replace({'a' : 'changed', 'A' : 'changed'})
>>> s

0    changed
1    changed
2    changed
3          b
4          c
5    changed
6          d
7    changed
dtype: object

CodePudding user response：

You can take advantage of row masking inside pandas DataFrame.loc.

Here, at frist row, I want the rows with the mask of "having their shapes to be equal to Marquis". So, putting this row masking along with the shape column inside DataFrame.loc, I will obtain the result. Just repeat the same code for Marwise.

df2.loc[df2['Shape'] == 'Marquis', 'Shape'] = 'Marquise'
df2.loc[df2['Shape'] == 'Marwise', 'Shape'] = 'Marquise'