Home > Software engineering >  Lookup a Dataframe column that has rows of lists with list and return matching column value of anoth
Lookup a Dataframe column that has rows of lists with list and return matching column value of anoth

Time:07-01

I tried the method suggested in Lookup a Dataframe column with list and return list matching the row of another column? to look up the column value,Col1 that has the elements in the query list matching Col2, but faced with


KeyError: "None of [Index(['watermelon', 'grape'], dtype='object', name='Col1')] are in the [index]"

Code used

df = pd.DataFrame({'Col1': ['Jack', 'Mary', 'Andrew'], 
                   'Col2': [['apple', 'banana', 'papaya', 'tomato'],['berry', 'juice', 'watermelon'], ['cabbage', 'grape']]})

query = ['watermelon', 'grape']

q = df.set_index('Col1').loc[query]

Expecting

   Col1
0  Mary
1  Andrew


CodePudding user response:

Suggested method won't work for your query, because you want the opposite. loc is a function to query an index, not a column. Ie. the query can be something within this list ['Jack', 'Mary', 'Andrew'], which is why you are facing an error.

Also, unlike the dataframe from the suggested method, your column contains nested list:

     Col1                             Col2
0    Jack  [apple, banana, papaya, tomato]
1    Mary       [berry, juice, watermelon]
2  Andrew                 [cabbage, grape]

As far as I understand, you want to find those Col1 values where Col2 matches one of the queries. We can create a new Pandas Series, by applying a function that will search if the fruit is in the query.

df_ = df.set_index('Col1')['Col2'].apply(lambda fruits: any([f in query for f in fruits]))

Output:

print(df_[df_])

Col1
Mary      True
Andrew    True
Name: Col2, dtype: bool

Let me know if this was your intention.

CodePudding user response:

Here you go -

df = pd.DataFrame({'Col1': ['Jack', 'Mary', 'Andrew'], 
                   'Col2': [['apple', 'banana', 'papaya', 'tomato'],['berry', 'juice', 'watermelon'], ['cabbage', 'grape']]})

query = ['watermelon', 'grape']
df_x = df.explode('Col2')
df_x.set_index('Col2').loc[query].reset_index(drop=True)

And this is the output -

    Col1
0   Mary
1   Andrew
  • Related