Home > Blockchain >  Select rows if the value for certain columns are the same in pandas?
Select rows if the value for certain columns are the same in pandas?

Time:04-15

I have the following dataframe

id var1 var2 var3 .... var26  var27 var28
A   6    5    5   ....    0     0    nan
B   5    5    5   ....    5     5     5
C   3    3    3   ....    3     nan  nan
D   5    5    5   ....    5     5     2
.
.

I want to keep rows if the value for all columns are the same (in this case, the second row where id is B)

and i want to keep rows if the value for the first n columns are the same (if n=26, the third row where id is "C")

I tried for the first case

lambda x: min(x) == max(x)

but the problem is it picks up rows where there is only one non-null value. So i have to find a way to extract rows based on the value for each column.

Any help would be appreciated!

CodePudding user response:

Your 1st request can be done with

df[df.filter(like='var').eq(df['var1'],axis=0).all(axis=1)]

The 2nd

n = 26
df[df.filter(like='var').iloc[:,:n].eq(df['var1'],axis=0).all(axis=1)]

Notice here we can not use nunique due to NaN value will be ignored

CodePudding user response:

Here is a more elegant solution:

df.iloc[:,:26][df.iloc[:,:26].var(axis=1) == 0]

If you want all columns considered, drop 26 and use only :

  • Related