Home > Net >  How to combine variables from a column in only one variable?
How to combine variables from a column in only one variable?

Time:12-10

I'm working with a dataset from which I had faced the following situation:

df2['Shape'].value_counts(normalize=True)
Round       0.561806
Princess    0.090057
Emerald     0.070318
Oval        0.070072
Radiant     0.058722
Pear        0.044658
Marquise    0.028374
Asscher     0.023933
Oval        0.015297
ROUND       0.013570
Cushion     0.009623
Marwuise    0.005922
Marquis     0.003948
Uncut       0.003701
Name: Shape, dtype: float64

My goal is to combine the simmilar variables from this column (e.g, Round and ROUND; Oval and Oval) in only one variable. How can I combine they?

CodePudding user response:

Looks like you just want to standardize the names. You can lower or capitalize the shape ids before running value_counts:

df2['Shape'].str.capitalize().value_counts(normalize=True)

output:

Round     0.575376
Princess  0.090057
Oval      0.085369
Emerald   0.070318
Radiant   0.058722
Pear      0.044658
Marquise  0.028374
Asscher   0.023933
Cushion   0.009623
Marwuise  0.005922
Marquis   0.003948
Uncut     0.003701

CodePudding user response:

Maybe there are spaces on the right side of one of the "Oval" strings, then:

df['Shape'].str.capitalize().str.rstrip().value_counts(normalize=True)
  • Related