I have the following dataframe
df = pd.DataFrame({'Id':['1','2','3'],'List_Origin':[['A','B'],['B','C'],['A','B']]})
How could i only get the ids, that contain only a certain List_Origin, for example 'A','B'. Would appreciate if the solution avoided loops
Wanted end result
end_df = pd.DataFrame({'Id':['1','3'],'List_Origin':[['A','B'],['A','B']]})
CodePudding user response:
You can use apply
and check like below:
>>> df[df['List_Origin'].apply(lambda x: x==['A', 'B'] or x==['A,B'])]
Id List_Origin
0 1 [A,B]
2 3 [A, B]
CodePudding user response:
Unfortunately, when using lists, you cannot vectorize. You must use a loop.
I am assuming first that you have ['A', 'B']
and not ['A,B']
in the first row:
end_df = df[[x==['A', 'B'] for x in df['List_Origin']]]
output:
Id List_Origin
0 1 [A, B]
2 3 [A, B]
If, really, you have a mix of ['A', 'B']
and ['A,B']
, then use:
end_df = df[[','.join(x)=='A,B' for x in df['List_Origin']]]
output:
Id List_Origin
0 1 [A,B]
2 3 [A, B]