I have a dataframe like this:
test = pd.DataFrame({"id":[1,2,3,4],
"name_1":["peter","bobby","alex","chris"],
"name_1_flag":["real","fake","fake","real"],
"name_2":["hector","abi","henrik","miko"],
"name_2_flag":["fake","real","fake","fake"],
"name_3":["hans","khan","will","than"],
"name_3_flag":["fake","fake","real","fake"]})
id name_1 name_1_flag name_2 name_2_flag name_3 name_3_flag
0 1 peter real hector fake hans fake
1 2 bobby fake abi real khan fake
2 3 alex fake henrik fake will real
3 4 chris real miko fake than fake
How can I find the row/column tuple that has the word "real" in it.
Optimally the output would be an array or series like this:
col_index
0 3
1 5
2 7
3 3
CodePudding user response:
Use np.where:
test["col_index"] = np.where(test.eq("real"))[1] 1
print(test)
Output
id name_1 name_1_flag name_2 name_2_flag name_3 name_3_flag col_index
0 1 peter real hector fake hans fake 3
1 2 bobby fake abi real khan fake 5
2 3 alex fake henrik fake will real 7
3 4 chris real miko fake than fake 3
CodePudding user response:
Solutions:
Try np.argmax
:
>>> np.argmax(test.eq('real').to_numpy(), axis=1) 1
array([3, 5, 7, 3], dtype=int64)
>>>
--- Or get_indexer
:
test.columns.get_indexer(test.eq('real').idxmax(axis=1)) 1
Or .T.reset_index(drop=True)
:
test.T.reset_index(drop=True).eq('real').idxmax() 1
Making it a column:
np.argmax
:
test["col_index"] = np.argmax(test.eq('real').to_numpy(), axis=1) 1
The one with get_indexer
:
test["col_index"] = test.columns.get_indexer(test.eq('real').idxmax(axis=1)) 1
.T
:
test["col_index"] = test.T.reset_index(drop=True).eq('real').idxmax() 1
All Output:
id name_1 name_1_flag name_2 name_2_flag name_3 name_3_flag col_index
0 1 peter real hector fake hans fake 3
1 2 bobby fake abi real khan fake 5
2 3 alex fake henrik fake will real 7
3 4 chris real miko fake than fake 3
CodePudding user response:
Let us try with
s = test.where(lambda x : x=='real').stack()
test['new'] = test.columns.get_indexer(s.index.get_level_values(1)) 1
test
Out[11]:
id name_1 name_1_flag name_2 name_2_flag name_3 name_3_flag new
0 1 peter real hector fake hans fake 3
1 2 bobby fake abi real khan fake 5
2 3 alex fake henrik fake will real 7
3 4 chris real miko fake than fake 3
CodePudding user response:
You can also use dot
:
print (test.eq("real").dot(range(test.columns.size)) 1)
0 3
1 5
2 7
3 3
dtype: int32