Home > Back-end >  How to find which column contains a certain value?
How to find which column contains a certain value?

Time:10-27

I have a dataframe like this:

test = pd.DataFrame({"id":[1,2,3,4],
                     "name_1":["peter","bobby","alex","chris"],
                     "name_1_flag":["real","fake","fake","real"],
                     "name_2":["hector","abi","henrik","miko"],
                     "name_2_flag":["fake","real","fake","fake"],
                     "name_3":["hans","khan","will","than"],
                     "name_3_flag":["fake","fake","real","fake"]})

   id name_1 name_1_flag  name_2 name_2_flag name_3 name_3_flag
0   1  peter        real  hector        fake   hans        fake
1   2  bobby        fake     abi        real   khan        fake
2   3   alex        fake  henrik        fake   will        real
3   4  chris        real    miko        fake   than        fake

How can I find the row/column tuple that has the word "real" in it.

Optimally the output would be an array or series like this:

    col_index
0           3
1           5
2           7
3           3

CodePudding user response:

Use np.where:

test["col_index"] = np.where(test.eq("real"))[1]   1
print(test)

Output

   id name_1 name_1_flag  name_2 name_2_flag name_3 name_3_flag  col_index
0   1  peter        real  hector        fake   hans        fake          3
1   2  bobby        fake     abi        real   khan        fake          5
2   3   alex        fake  henrik        fake   will        real          7
3   4  chris        real    miko        fake   than        fake          3

CodePudding user response:

Solutions:

Try np.argmax:

>>> np.argmax(test.eq('real').to_numpy(), axis=1)   1
array([3, 5, 7, 3], dtype=int64)
>>> 

--- Or get_indexer:

test.columns.get_indexer(test.eq('real').idxmax(axis=1))   1

Or .T.reset_index(drop=True):

test.T.reset_index(drop=True).eq('real').idxmax()   1

Making it a column:

np.argmax:

test["col_index"] = np.argmax(test.eq('real').to_numpy(), axis=1)   1

The one with get_indexer:

test["col_index"] = test.columns.get_indexer(test.eq('real').idxmax(axis=1))   1

.T:

test["col_index"] = test.T.reset_index(drop=True).eq('real').idxmax()   1



All Output:

   id name_1 name_1_flag  name_2 name_2_flag name_3 name_3_flag  col_index
0   1  peter        real  hector        fake   hans        fake          3
1   2  bobby        fake     abi        real   khan        fake          5
2   3   alex        fake  henrik        fake   will        real          7
3   4  chris        real    miko        fake   than        fake          3

CodePudding user response:

Let us try with

s = test.where(lambda x : x=='real').stack()
test['new'] = test.columns.get_indexer(s.index.get_level_values(1)) 1
test
Out[11]: 
   id name_1 name_1_flag  name_2 name_2_flag name_3 name_3_flag  new
0   1  peter        real  hector        fake   hans        fake    3
1   2  bobby        fake     abi        real   khan        fake    5
2   3   alex        fake  henrik        fake   will        real    7
3   4  chris        real    miko        fake   than        fake    3

CodePudding user response:

You can also use dot:

print (test.eq("real").dot(range(test.columns.size)) 1)

0    3
1    5
2    7
3    3
dtype: int32
  • Related