having the following dataframe:
import pandas as pd
cars = ["BMV", "Mercedes", "Audi"]
customer = ["Juan", "Pepe", "Luis"]
price = [100, 200, 300]
year = [2022, 2021, 2020]
df_raw = pd.DataFrame(list(zip(cars, customer, price, year)),\
columns=["cars", "customer", "price", 'year'])
I need to do one-hot encoding from the categorical variables cars
and customer
, for this I use the get_dummies method for these two columns.
numerical = ["price", "year"]
df_final = pd.concat([df_raw[numerical], pd.get_dummies(df_raw.cars),\
pd.get_dummies(df_raw.customer)], axis=1)
Is there a way to generate these dummies in a dynamic way, like putting them in a list and loop through them with a for.In this case it may seem simple because I only have 2, but if I had 30 or 60 attributes, would I have to go one by one?
CodePudding user response:
pd.get_dummies
pd.get_dummies(df_raw, columns=['cars', 'customer'])
price year cars_Audi cars_BMV cars_Mercedes customer_Juan customer_Luis customer_Pepe
0 100 2022 0 1 0 1 0 0
1 200 2021 0 0 1 0 0 1
2 300 2020 1 0 0 0 1 0
CodePudding user response:
One simple way is to concatenate the columns and use str.get_dummies
:
cols = ['cars', 'customer']
out = df_raw.join(df_raw[cols].agg('|'.join, axis=1).str.get_dummies())
output:
cars customer price year Audi BMV Juan Luis Mercedes Pepe
0 BMV Juan 100 2022 0 1 1 0 0 0
1 Mercedes Pepe 200 2021 0 0 0 0 1 1
2 Audi Luis 300 2020 1 0 0 1 0 0
Another option is to melt
and use crosstab
:
df2 = df_raw[cols].reset_index().melt('index')
out = df_raw.join(pd.crosstab(df2['index'], df2['value']))