I have a table of an "Id" column and multiple integer columns that I want to convert to categorical variables. Therefore, I want to apply this transformation only to those multiple integer columns, but leave the ID column unchanged.
All the other methods involve dropping the ID column. How do I do this without dropping the ID column?
This is the current code i have:
df= df.loc[:, df.columns != 'Id'].apply(lambda x: x.astype('category'))
Sample dataframe:
{'Id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4},
'Foundation': {0: 2, 1: 1, 2: 2, 3: 0, 4: 2},
'GarageFinish': {0: 1, 1: 1, 2: 1, 3: 2, 4: 1},
'LandSlope': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'LotConfig': {0: 4, 1: 2, 2: 4, 3: 0, 4: 2},
'GarageQual': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4},
'GarageCond': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4},
'LandContour': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3},
'Utilities': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'GarageType': {0: 1, 1: 1, 2: 1, 3: 5, 4: 1},
'LotShape': {0: 3, 1: 3, 2: 0, 3: 0, 4: 0},
'Alley': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2},
'Street': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'PoolQC': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3},
'Fence': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4},
'MiscFeature': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4},
'MSZoning': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3},
'SaleType': {0: 8, 1: 8, 2: 8, 3: 8, 4: 8},
'PavedDrive': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2},
'FireplaceQu': {0: 5, 1: 4, 2: 4, 3: 2, 4: 4},
'Condition1': {0: 2, 1: 1, 2: 2, 3: 2, 4: 2},
'Functional': {0: 6, 1: 6, 2: 6, 3: 6, 4: 6},
'BsmtQual': {0: 2, 1: 2, 2: 2, 3: 3, 4: 2},
'BsmtCond': {0: 3, 1: 3, 2: 3, 3: 1, 4: 3},
'BsmtExposure': {0: 3, 1: 1, 2: 2, 3: 3, 4: 0},
'BsmtFinType1': {0: 2, 1: 0, 2: 2, 3: 0, 4: 2},
'ExterQual': {0: 2, 1: 3, 2: 2, 3: 3, 4: 2},
'BsmtFinType2': {0: 5, 1: 5, 2: 5, 3: 5, 4: 5},
'MasVnrType': {0: 1, 1: 2, 2: 1, 3: 2, 4: 1},
'Exterior2nd': {0: 13, 1: 8, 2: 13, 3: 15, 4: 13},
'Heating': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'Neighborhood': {0: 5, 1: 24, 2: 5, 3: 6, 4: 15},
'SaleCondition': {0: 4, 1: 4, 2: 4, 3: 0, 4: 4},
'Electrical': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4},
'Exterior1st': {0: 12, 1: 8, 2: 12, 3: 13, 4: 12},
'RoofMatl': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'RoofStyle': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'HouseStyle': {0: 5, 1: 2, 2: 5, 3: 5, 4: 5},
'BldgType': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'Condition2': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2},
'KitchenQual': {0: 2, 1: 3, 2: 2, 3: 2, 4: 2},
'ExterCond': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4},
'CentralAir': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'HeatingQC': {0: 0, 1: 0, 2: 0, 3: 2, 4: 0}}
CodePudding user response:
One way to do this is by isolating the Id
column and then joining the converted columns:
df = df[['Id']].join(
df.loc[:, df.columns != 'Id'].astype('category')
)
CodePudding user response:
Another way is to try:
df = df.groupby('Id').transform(lambda x: pd.Categorical(x)).reset_index(names = 'id')