Using pandas, how do i loc a value where my column contains lists?-CodePudding

I have a df, with a column that contains a list. for example -

df = pd.DataFrame({'name': ['name1', 'name2', 'name3', 'name4'],
                   'age': [21, 23, 24, 28],
                   'occupation': ['data scientist',  'doctor',  'data analyst', 'engineer'],
                   'knowledge':[['python','c  '], ['python', 'c#'], ['css','js','html'], ['c#']],
                  })

now, I want to locate only the rows with 'python' as one of the 'knowledge' values in the list. how do I do that?

I tried to do: pd.loc[(pd['knowledge'].isin['python'])] and it didn't work

(edited to fix the code)

CodePudding user response：

You need to use a loop:

df[['python' in l for l in df['knowledge']]]

output:

    name  age      occupation      knowledge
0  name1   21  data scientist  [python, c  ]
1  name2   23          doctor   [python, c#]

alternatives

finding any element of a set

keep rows with at least one match

search = set(['python', 'js'])
df[[bool(search.intersection(l)) for l in df['knowledge']]]

output:

    name  age      occupation        knowledge
0  name1   21  data scientist    [python, c  ]
1  name2   23          doctor     [python, c#]
2  name3   24    data analyst  [css, js, html]

matching all elements of a set

all elements need to match

search = set(['python', 'c  '])
df[[search <= set(l) for l in df['knowledge']]]

output:

    name  age      occupation      knowledge
0 name1   21  data scientist  [python, c  ]

CodePudding user response：

You can try to join the list into space separated value, then find it contains your wanted word with word boundry.

m = df['knowledge'].str.join(' ').str.contains(r'\bpython\b')

Or you can try Series.apply

m = df['knowledge'].apply(lambda l: 'python' in l)

print(m)

0     True
1     True
2    False
3    False
Name: knowledge, dtype: bool

The use boolean indexing to select the True rows

print(df[m])

    name  age      occupation      knowledge
0  name1   21  data scientist  [python, c  ]
1  name2   23          doctor   [python, c#]