I have a df, with a column that contains a list. for example -
df = pd.DataFrame({'name': ['name1', 'name2', 'name3', 'name4'],
'age': [21, 23, 24, 28],
'occupation': ['data scientist', 'doctor', 'data analyst', 'engineer'],
'knowledge':[['python','c '], ['python', 'c#'], ['css','js','html'], ['c#']],
})
now, I want to locate only the rows with 'python' as one of the 'knowledge' values in the list. how do I do that?
I tried to do: pd.loc[(pd['knowledge'].isin['python'])] and it didn't work
(edited to fix the code)
CodePudding user response:
You need to use a loop:
df[['python' in l for l in df['knowledge']]]
output:
name age occupation knowledge
0 name1 21 data scientist [python, c ]
1 name2 23 doctor [python, c#]
alternatives
finding any element of a set
keep rows with at least one match
search = set(['python', 'js'])
df[[bool(search.intersection(l)) for l in df['knowledge']]]
output:
name age occupation knowledge
0 name1 21 data scientist [python, c ]
1 name2 23 doctor [python, c#]
2 name3 24 data analyst [css, js, html]
matching all elements of a set
all elements need to match
search = set(['python', 'c '])
df[[search <= set(l) for l in df['knowledge']]]
output:
name age occupation knowledge
0 name1 21 data scientist [python, c ]
CodePudding user response:
You can try to join the list into space separated value, then find it contains your wanted word with word boundry.
m = df['knowledge'].str.join(' ').str.contains(r'\bpython\b')
Or you can try Series.apply
m = df['knowledge'].apply(lambda l: 'python' in l)
print(m)
0 True
1 True
2 False
3 False
Name: knowledge, dtype: bool
The use boolean indexing to select the True
rows
print(df[m])
name age occupation knowledge
0 name1 21 data scientist [python, c ]
1 name2 23 doctor [python, c#]