What is the best search method for finding entries in a dataframe based on their column values. My dataframe is formatted like this:
| header a | header b | header c | header d | header e |
|----------|----------|----------|----------|----------|
| 1 | 1 | 1 | 1 | 1 |
| 3 | 2 | 5 | 6 | 2 |
| 2 | 2 | 2 | 2 | 2 |
There are hundreds of unsorted rows and I am looking for the row with values [3,2,5,6,2]. The brute force approach would be to check if the first column contains the value 3. If so, check if the second column contains the value 2, etc. Is there a more optimized approach to this?
CodePudding user response:
In mask it is checked for equality with the searched list. The indexes are then stored in the list generator, where all values matched. The print outputs the found lines.
import pandas as pd
df = pd.DataFrame({'header a':[1,3,2,2],'header b':[1,2,2,3],'header c':[1,5,2,5],
'header d':[1,6,2,6],'header e':[1,2,2,2]})
mask = df.values == [3,2,5,6,2]
result = [i for i in range(0, len(mask)) if mask[i].all() == True]
print(df.loc[result,:])
Output
header a header b header c header d header e
1 3 2 5 6 2