I have the following dataframe
id var1 var2 var3 .... var26 var27 var28
A 6 5 5 .... 0 0 nan
B 5 5 5 .... 5 5 5
C 3 3 3 .... 3 nan nan
D 5 5 5 .... 5 5 2
.
.
I want to keep rows if the value for all columns are the same (in this case, the second row where id is B)
and i want to keep rows if the value for the first n columns are the same (if n=26, the third row where id is "C")
I tried for the first case
lambda x: min(x) == max(x)
but the problem is it picks up rows where there is only one non-null value. So i have to find a way to extract rows based on the value for each column.
Any help would be appreciated!
CodePudding user response:
Your 1st request can be done with
df[df.filter(like='var').eq(df['var1'],axis=0).all(axis=1)]
The 2nd
n = 26
df[df.filter(like='var').iloc[:,:n].eq(df['var1'],axis=0).all(axis=1)]
Notice here we can not use nunique
due to NaN
value will be ignored
CodePudding user response:
Here is a more elegant solution:
df.iloc[:,:26][df.iloc[:,:26].var(axis=1) == 0]
If you want all columns considered, drop 26
and use only :