I'm working in a dataset which I faced the following situation:
df2['Shape'].value_counts(normalize=True)
Round 0.574907
Princess 0.093665
Oval 0.082609
Emerald 0.068820
Radiant 0.059752
Pear 0.041739
Marquise 0.029938
Asscher 0.024099
Cushion 0.010807
Marwuise 0.005342
Uncut 0.004720
Marquis 0.003602
Name: Shape, dtype: float64
and my goal is to make the variables 'Marquis' and 'Marwise' be included into the variable 'Marquise'. How can I combine they?
CodePudding user response:
Since you didn't state any restrictions, a quick fix will be that you can first change the entries the way you desire as shown below-
df2['Shape'][df2['Shape'] == 'Marquis'] = 'Marquise'
df2['Shape'][df2['Shape'] == 'Marwise'] = 'Marquise'
Now, run this command,
df2['Shape'].value_counts(normalize=True)
CodePudding user response:
>>> s = ['a', 'A', 'a', 'b', 'c', 'A', 'd', 'A']
>>> s = pd.Series(s).replace({'a' : 'changed', 'A' : 'changed'})
>>> s
0 changed
1 changed
2 changed
3 b
4 c
5 changed
6 d
7 changed
dtype: object
CodePudding user response:
You can take advantage of row masking inside pandas DataFrame.loc
.
Here, at frist row, I want the rows with the mask of "having their shapes to be equal to Marquis
". So, putting this row masking along with the shape
column inside DataFrame.loc
, I will obtain the result.
Just repeat the same code for Marwise
.
df2.loc[df2['Shape'] == 'Marquis', 'Shape'] = 'Marquise'
df2.loc[df2['Shape'] == 'Marwise', 'Shape'] = 'Marquise'