Home > Software design >  Pandas Apply transformation to multiple columns but do not discard other columns?
Pandas Apply transformation to multiple columns but do not discard other columns?

Time:11-06

I have a table of an "Id" column and multiple integer columns that I want to convert to categorical variables. Therefore, I want to apply this transformation only to those multiple integer columns, but leave the ID column unchanged.

All the other methods involve dropping the ID column. How do I do this without dropping the ID column?

This is the current code i have:

df= df.loc[:, df.columns != 'Id'].apply(lambda x: x.astype('category'))

Sample dataframe:

{'Id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4},
 'Foundation': {0: 2, 1: 1, 2: 2, 3: 0, 4: 2},
 'GarageFinish': {0: 1, 1: 1, 2: 1, 3: 2, 4: 1},
 'LandSlope': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
 'LotConfig': {0: 4, 1: 2, 2: 4, 3: 0, 4: 2},
 'GarageQual': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4},
 'GarageCond': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4},
 'LandContour': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3},
 'Utilities': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
 'GarageType': {0: 1, 1: 1, 2: 1, 3: 5, 4: 1},
 'LotShape': {0: 3, 1: 3, 2: 0, 3: 0, 4: 0},
 'Alley': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2},
 'Street': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
 'PoolQC': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3},
 'Fence': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4},
 'MiscFeature': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4},
 'MSZoning': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3},
 'SaleType': {0: 8, 1: 8, 2: 8, 3: 8, 4: 8},
 'PavedDrive': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2},
 'FireplaceQu': {0: 5, 1: 4, 2: 4, 3: 2, 4: 4},
 'Condition1': {0: 2, 1: 1, 2: 2, 3: 2, 4: 2},
 'Functional': {0: 6, 1: 6, 2: 6, 3: 6, 4: 6},
 'BsmtQual': {0: 2, 1: 2, 2: 2, 3: 3, 4: 2},
 'BsmtCond': {0: 3, 1: 3, 2: 3, 3: 1, 4: 3},
 'BsmtExposure': {0: 3, 1: 1, 2: 2, 3: 3, 4: 0},
 'BsmtFinType1': {0: 2, 1: 0, 2: 2, 3: 0, 4: 2},
 'ExterQual': {0: 2, 1: 3, 2: 2, 3: 3, 4: 2},
 'BsmtFinType2': {0: 5, 1: 5, 2: 5, 3: 5, 4: 5},
 'MasVnrType': {0: 1, 1: 2, 2: 1, 3: 2, 4: 1},
 'Exterior2nd': {0: 13, 1: 8, 2: 13, 3: 15, 4: 13},
 'Heating': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
 'Neighborhood': {0: 5, 1: 24, 2: 5, 3: 6, 4: 15},
 'SaleCondition': {0: 4, 1: 4, 2: 4, 3: 0, 4: 4},
 'Electrical': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4},
 'Exterior1st': {0: 12, 1: 8, 2: 12, 3: 13, 4: 12},
 'RoofMatl': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
 'RoofStyle': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
 'HouseStyle': {0: 5, 1: 2, 2: 5, 3: 5, 4: 5},
 'BldgType': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
 'Condition2': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2},
 'KitchenQual': {0: 2, 1: 3, 2: 2, 3: 2, 4: 2},
 'ExterCond': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4},
 'CentralAir': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
 'HeatingQC': {0: 0, 1: 0, 2: 0, 3: 2, 4: 0}}

CodePudding user response:

One way to do this is by isolating the Id column and then joining the converted columns:

df = df[['Id']].join(
    df.loc[:, df.columns != 'Id'].astype('category')
)

CodePudding user response:

Another way is to try:

df = df.groupby('Id').transform(lambda x: pd.Categorical(x)).reset_index(names = 'id')
  • Related