I tried the method suggested in Lookup a Dataframe column with list and return list matching the row of another column? to look up the column value,Col1
that has the elements in the query
list matching Col2
, but faced with
KeyError: "None of [Index(['watermelon', 'grape'], dtype='object', name='Col1')] are in the [index]"
Code used
df = pd.DataFrame({'Col1': ['Jack', 'Mary', 'Andrew'],
'Col2': [['apple', 'banana', 'papaya', 'tomato'],['berry', 'juice', 'watermelon'], ['cabbage', 'grape']]})
query = ['watermelon', 'grape']
q = df.set_index('Col1').loc[query]
Expecting
Col1
0 Mary
1 Andrew
CodePudding user response:
Suggested method won't work for your query, because you want the opposite.
loc
is a function to query an index, not a column. Ie. the query can be something within this list ['Jack', 'Mary', 'Andrew']
, which is why you are facing an error.
Also, unlike the dataframe from the suggested method, your column contains nested list:
Col1 Col2
0 Jack [apple, banana, papaya, tomato]
1 Mary [berry, juice, watermelon]
2 Andrew [cabbage, grape]
As far as I understand, you want to find those Col1
values where Col2
matches one of the queries. We can create a new Pandas Series, by applying a function that will search if the fruit is in the query.
df_ = df.set_index('Col1')['Col2'].apply(lambda fruits: any([f in query for f in fruits]))
Output:
print(df_[df_])
Col1
Mary True
Andrew True
Name: Col2, dtype: bool
Let me know if this was your intention.
CodePudding user response:
Here you go -
df = pd.DataFrame({'Col1': ['Jack', 'Mary', 'Andrew'],
'Col2': [['apple', 'banana', 'papaya', 'tomato'],['berry', 'juice', 'watermelon'], ['cabbage', 'grape']]})
query = ['watermelon', 'grape']
df_x = df.explode('Col2')
df_x.set_index('Col2').loc[query].reset_index(drop=True)
And this is the output -
Col1
0 Mary
1 Andrew