Home > Software engineering >  how to drop pandas df columns whose variance is in a tolerance rank?
how to drop pandas df columns whose variance is in a tolerance rank?

Time:09-17

Suppose the next df:

d={'month': ['01/01/2020', '01/02/2020', '01/03/2020', '01/01/2020', '01/02/2020', '01/03/2020'], 
   'country': ['Japan', 'Japan', 'Japan', 'Poland', 'Poland', 'Poland'], 
   'level':['A01', 'A01', 'A01', 'A00','A00', 'A00'],
   'job title':['Insights Manager', 'Insights Manager', 'Insights Manager', 'Sales Director', 'Sales Director', 'Sales Director'],
   'number':[0, 0.001, 0, 0, 0, 0],
   'age':[24, 22, 45, 38, 60, 32]}


df=pd.DataFrame(d)

When trying to get the variance for all columns the next result arises:

import pandas as pd

df.agg("var")

Result:

number    1.666667e-07
age       2.025667e 02
dtype: float64

The idea is to remove columns which are in a certain rank, say, drop it if column variance is in a rank between 0 and 0.0001, (i.e: drop number column as it's variance is within this rank).

When tried to do this:

df= df.loc[:, 0 < df.std() < .0001]

The next error arises:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Is it possible to drop pandas df column whose variance is within a tolerance rank?

CodePudding user response:

You can't use chained comparison operator with pandas series, since under the hood, they are translated to and which only works with scalars. Use vectorized & instead:

df[(((v := df.var()) > 0) & (v < 0.0001))[lambda x: x].index]

   number
0   0.000
1   0.001
2   0.000
3   0.000
4   0.000
5   0.000

CodePudding user response:

Another solution (using .between .drop(columns=...))

var = df.agg("var", numeric_only=True)
df = df.drop(columns=var[var.between(0, 0.0001)].index)
print(df)

Prints:

        month country level         job title  age
0  01/01/2020   Japan   A01  Insights Manager   24
1  01/02/2020   Japan   A01  Insights Manager   22
2  01/03/2020   Japan   A01  Insights Manager   45
3  01/01/2020  Poland   A00    Sales Director   38
4  01/02/2020  Poland   A00    Sales Director   60
5  01/03/2020  Poland   A00    Sales Director   32
  • Related