I am working with a pandas dataframe that is represented as a large truth matrix.
a1 b2 c3 d4 e5 f6
a1 True True False True False True
b2 True True False True True True
c3 False False True True True False
d4 True True True True False True
e5 False True True False True False
f6 True True False True False True
I can grab the counts of rows and their respective True
counts with:
df[df == True].count(axis=0)
Which would return:
a1: 4
b2: 5
c3: 3
d4: 5
e5: 3
f6: 4
Likewise, I can obtain the index with the most True values and it's name with:
max_count = max(df[df == True].count(axis=0)
id = df[df == True].count(axis=0).idxmax(axis=0)
Which returns max_count=5
and id=b2
I am curious then how I can query that row to return all indices with True in a list.
The ideal output would be:
ids = [a1, b2, d4, e5, f6]
I have tried this:
ids = df[id == True].index.tolist()
Which results in a key error and many variations of the above.
CodePudding user response:
First, because you have a boolean matrix here, you can use sum
with axis=1 to get counts of True values for each row index.
df.sum(axis=1)
Output:
a1 4
b2 5
c3 3
d4 5
e5 3
f6 4
dtype: int64
Now, let's find "all" the indexes that have the most True values, using np.where
:
max_true_index = df.index[np.where(df.sum(axis=1) == df.sum(axis=1).max())]
max_true_index returns:
Index(['b2', 'd4'], dtype='object')
Finally, let's return all indexes that have True in these two values or from this list:
df.index[df[max_true_index].all(axis=1)]
Output:
Index(['a1', 'b2', 'd4', 'f6'], dtype='object')