How to calculate the mean of a specific value in columns in Python?-CodePudding

I'm trying to drop columns that have too many missing values. How can I count the occurrence of some values within columns since the missing values are represented using 99 or 90

here is the code that is supposed to drop columns that exceed the threshold value

threshold = 0.6

data = data[data.columns[[data.column == 90 or data.column == 99].count().mean() < threshold]]

I'm not quite used to using pandas, any suggestions would be helpful

CodePudding user response：

You're almost there. Use apply:

threshold = 0.6
out = data[data.apply(lambda s: s.isin([90, 99])).mean(1).lt(threshold)]

Example input:

    0   1   2   3   4
0   0  90   0   0   0
1   0   0   0   0   0
2   0  90   0  99   0
3  90   0   0   0   0
4  99  99   0  90  99  # to drop
5  99   0   0   0  99
6   0   0  99   0  90
7   0  90  99   0  90  #
8  99  90   0  90   0  #
9   0  99   0   0   0

output:

    0   1   2   3   4
0   0  90   0   0   0
1   0   0   0   0   0
2   0  90   0  99   0
3  90   0   0   0   0
5  99   0   0   0  99
6   0   0  99   0  90
9   0  99   0   0   0