Home > Software design >  How to filter dataset by difference of mean values in columns?
How to filter dataset by difference of mean values in columns?

Time:04-27

I have a dataframe:

id1     vals1   id2    vals2
a1      [5,6]   b1     [8]
c1      [5,3]   e1     [4,5,6]

I want to calculate mean values of vals1 and vals2 and filter my dataframe if absolute value of difference between those mean values is higher than 1, to remove that row. How to do that?

So desired result is:

id1     vals1   id2    vals2
c1      [5,3]   e1     [4,5,6]

CodePudding user response:

you can try something like this:

from statistics import mean

res = df.loc[df.apply(lambda x: mean(x.vals1) - mean(x.vals2),1).abs()<=1]

>>> res
'''
  id1   vals1 id2      vals2
1  c1  [5, 3]  e1  [4, 5, 6]
  • Related