Home > Mobile >  Onehotencoded dataframe won't join with original dataframe in for loop
Onehotencoded dataframe won't join with original dataframe in for loop

Time:04-14

I have a dataframe:

train_df = pd.DataFrame({'home':['A','A','B','C','C'],'dest':['X','Y','Y','X','Y']})

If I do:

train_df[['home','dest']] = train_df[['home','dest']].astype('category')
    
from sklearn.preprocessing import OneHotEncoder
onehotenc = OneHotEncoder(handle_unknown='ignore')   
 
encoded_df = pd.DataFrame(onehotenc.fit_transform(train_df[['home','dest']]).toarray())
encoded_df.columns = onehotenc.get_feature_names_out()
train_df = train_df.join(encoded_df)

I do get the train_df dataframe with encoded_df columns added on the right. However, if I do

for df in [train_df]:
  df[['home','dest']] = df[['home','dest']].astype('category')

  from sklearn.preprocessing import OneHotEncoder
  onehotenc = OneHotEncoder(handle_unknown='ignore')

  encoded_df = pd.DataFrame(onehotenc.fit_transform(df[['home','dest']]).toarray())
  encoded_df.columns = onehotenc.get_feature_names_out()
  df = df.join(encoded_df)

the train_df is same as before. Why does the assignment not work in the for loop case? I need to do similar encoding on multiple dataframes, and add encoded columns to those dataframes. How can I do it in a for loop?

CodePudding user response:

Python for ... in is actually assigning object in list to new variable, so if you modify the variable, it doesn't affect object in list.

You can either append the modified object to a new list or replace the object in list with it.

dfs = []

for df in [train_df]:
    ...
    dfs.append(df)

# or

dsf = [train_df]

for idx, df in enumerate(dfs):
    ...
    dfs[idx] = df
  • Related