Home > Enterprise >  Select rows of pandas dataframe based on string in nested list
Select rows of pandas dataframe based on string in nested list

Time:08-03

How can I select a subset of a pandas dataframe based on the condition if a column which is a nested list contains a given string.

import pandas as pd

df = pd.DataFrame({'id': [12, 34, 43], 'course': ['Mathematics', 'Sport', 'Biology'], 'students': [['John Doe', 'Peter Parker', 'Lois Lane'], ['Bruce Banner', 'Lois Lane'], ['John Doe', 'Bruce Banner']]})

And now I would like to select all rows in which John Doe is in the students.

CodePudding user response:

df[df.students.apply(lambda row: "John Doe" in row)]

CodePudding user response:

Here is a vectorized option:

df[(df['students'].explode() == 'John Doe').groupby(level=0).any()]
  • Related