Home > Mobile >  How do I Convert categorical columns into numerical ones that have multiple values in the column
How do I Convert categorical columns into numerical ones that have multiple values in the column

Time:04-04

I am trying to convert the following columns into numerical values but I am stuck. I am not sure how to use pandas apply method when the column has multiple values.

from seaborn import load_dataset

dfdi = load_dataset("diamonds")
dfdi

There are three values that have categorical data
dfdi.cut.value_counts().sort_index()
Ideal 21551
Premium 13791
Very Good 12082
Good 4906
Fair 1610
Name: cut, dtype: int64

dfdi.color.value_counts().sort_index()
D 6775
E 9797
F 9542
G 11292
H 8304
I 5422
J 2808
Name: color, dtype: int64

dfdi.clarity.value_counts().sort_index()
IF 1790
VVS1 3655
VVS2 5066
VS1 8171
VS2 12258
SI1 13065
SI2 9194
I1 741
Name: clarity, dtype: int64

I am able to convert if it was binary. I tried to put an elif but I was getting an incorrect syntax error

dfdi.cut = dfdi.cut.apply(lambda x: 5 if x == "Ideal" else 0)

Tried but failed elif
dfdi.cut = dfdi.cut.apply(lambda x: 5 if x == "Ideal" elif 4 if x == "Premium" elif 3 if x == "Very Good" elif 2 if x == "Good" else 1)

I think a nested lambda would work here, but I am not sure on the syntax

CodePudding user response:

A map is what you need:

df["cut"] = df["cut"].map({
    "Ideal": 5,
    "Premium": 4,
    "Very Good": 3,
    "Good": 2,
    "Fair": 1
})

CodePudding user response:

ds = dict(zip(df["cut"].unique().tolist(), [5,4,3,2,1]))
df["cut"].apply(lambda x: ds[x])

simply, use a dictionary...:) create a dictionary that key is all unique values and value is your desired value. and ternary operator in python is like:

data = value1 if condition1 else value2 if condition2 else value3

not elif... use a new ternary in else part

  • Related