how to check if the selected columns in a dataframe return True and how to compare the count of True values with the column name count that is handling the count number of cells that are NOT NULL.
WHERE these selected columns can have a False as value.
for this i used the below statement but i do not know how to compare it with the column count
df=pd.DataFrame({'dept':['dept1','dept2','dept3','dept4','dept5'],
'subd':['hndf','nbf','asx','qwe','def'],
'jju':['0','1','1','NA','1'],
'rob':['1','0','NA','1','1'],
'ans':['0','0','1','NA','1'],
'zsd':['1','NA','1','1','1'],
'count':['4','3','3','2','4']}
dept subd jju rob ans zsd count
0 dept1 hndf 0 1 0 1 4
1 dept2 nbf 1 0 0 NA 3
2 dept3 asx 1 NA 1 1 3
3 dept4 qwe NA 1 NA 1 2
4 dept5 def 1 1 1 1 4
df = df[df.loc[:,'jju':'zsd'].eq("1")]
how to continue from this ???
expected result :
0 dept3 asx 1 NA 1 1 3
1 dept4 qwe NA 1 NA 1 2
2 dept5 def 1 1 1 1 4
CodePudding user response:
You can replace "NA"
by NaN values, then sum
on axis, then use eq
to compare to count
column value to create a boolean mask. Then use the mask to filter df
:
msk = df['count'].astype(int).eq(df.loc[:, 'jju':'zsd'].replace('NA', float('nan')).astype(float).eq(1).sum(axis=1))
out = df[msk]
Output:
dept subd jju rob ans zsd count
2 dept3 asx 1 NA 1 1 3
3 dept4 qwe NA 1 NA 1 2
4 dept5 def 1 1 1 1 4
CodePudding user response:
This does the job,
df.replace("NA", np.nan, inplace = True)
df.iloc[:, 2:] = df.iloc[:, 2:].apply(pd.to_numeric, errors = "coerce", axis = 1)
Output -
dept | subd | jju | rob | ans | zsd | count | |
---|---|---|---|---|---|---|---|
0 | dept1 | hndf | 0.0 | 1.0 | 0.0 | 1.0 | 4.0 |
1 | dept2 | nbf | 1.0 | 0.0 | 0.0 | nan | 3.0 |
2 | dept3 | asx | 1.0 | nan | 1.0 | 1.0 | 3.0 |
3 | dept4 | qwe | nan | 1.0 | nan | 1.0 | 2.0 |
4 | dept5 | def | 1.0 | 1.0 | 1.0 | 1.0 | 4.0 |