I am trying to replace the zeroes in ~15000 columns [columns 6:14844] with the group mean, while leaving the group label column [column 1] and a couple of other identifying columns [columns 2:5] untouched . This is the code I came up with, which works, other than that it replaces the columns that I want to be skipped over [1:5] with NaNs
df = df.mask(df.iloc[:, np.r_[1, 6:14844]].eq(0), df.iloc[:, np.r_[1, 6:14844]].groupby('group_label').transform('mean'))
Thanks in advance.
CodePudding user response:
You need to assign to only the subset of columns, not the whole DataFrame.
df.iloc[:, 6:14844] = (df.iloc[:, 6:14844]
.mask(df.iloc[:, 6:14844].eq(0),
df.iloc[:, np.r_[1, 6:14844]]
.groupby('group_label')
.transform('mean')))