Home > Net >  Aggregating column names that are True in a list
Aggregating column names that are True in a list

Time:05-10

I am working with a pandas dataframe that is represented as a large truth matrix.

       a1     b2      c3     d4     e5     f6
a1    True   True    False  True   False  True
b2    True   True    False  True   True   True
c3    False  False   True   True   True   False
d4    True   True    True   True   False  True
e5    False  True    True   False  True   False
f6    True   True    False  True   False  True

I can grab the counts of rows and their respective True counts with:

df[df == True].count(axis=0)

Which would return:

a1: 4
b2: 5
c3: 3
d4: 5
e5: 3
f6: 4

Likewise, I can obtain the index with the most True values and it's name with:

max_count = max(df[df == True].count(axis=0)
id = df[df == True].count(axis=0).idxmax(axis=0)

Which returns max_count=5 and id=b2 I am curious then how I can query that row to return all indices with True in a list.

The ideal output would be:

ids = [a1, b2, d4, e5, f6]

I have tried this:

ids = df[id == True].index.tolist()

Which results in a key error and many variations of the above.

CodePudding user response:

First, because you have a boolean matrix here, you can use sum with axis=1 to get counts of True values for each row index.

df.sum(axis=1)

Output:

a1    4
b2    5
c3    3
d4    5
e5    3
f6    4
dtype: int64

Now, let's find "all" the indexes that have the most True values, using np.where:

max_true_index = df.index[np.where(df.sum(axis=1) == df.sum(axis=1).max())]

max_true_index returns:

Index(['b2', 'd4'], dtype='object')

Finally, let's return all indexes that have True in these two values or from this list:

df.index[df[max_true_index].all(axis=1)]

Output:

Index(['a1', 'b2', 'd4', 'f6'], dtype='object')
  • Related