I'm working with a dataset from which I had faced the following situation:
df2['Shape'].value_counts(normalize=True)
Round 0.561806
Princess 0.090057
Emerald 0.070318
Oval 0.070072
Radiant 0.058722
Pear 0.044658
Marquise 0.028374
Asscher 0.023933
Oval 0.015297
ROUND 0.013570
Cushion 0.009623
Marwuise 0.005922
Marquis 0.003948
Uncut 0.003701
Name: Shape, dtype: float64
My goal is to combine the simmilar variables from this column (e.g, Round and ROUND; Oval and Oval) in only one variable. How can I combine they?
CodePudding user response:
Looks like you just want to standardize the names. You can lower
or capitalize
the shape ids before running value_counts
:
df2['Shape'].str.capitalize().value_counts(normalize=True)
output:
Round 0.575376
Princess 0.090057
Oval 0.085369
Emerald 0.070318
Radiant 0.058722
Pear 0.044658
Marquise 0.028374
Asscher 0.023933
Cushion 0.009623
Marwuise 0.005922
Marquis 0.003948
Uncut 0.003701
CodePudding user response:
Maybe there are spaces on the right side of one of the "Oval" strings, then:
df['Shape'].str.capitalize().str.rstrip().value_counts(normalize=True)