I have a pandas
DataFrame structured as follows:
ID Class
0 2431214 16,41,9
1 2497796 14,16,18,20,24,35,41,42
2 1407550 20,21,24,25,26,27,28,3,34,35,4,8
3 472723 1,17,22
4 423046 28,30,32,34,39,4,42,8
Class
is a string attribute. I want to filter rows where Class
contains at least one of the elements in a given list, for example, if my filter is l = [16, 27]
, I should get the following:
ID Class
0 2431214 16,41,9
1 2497796 14,16,18,20,24,35,41,42
2 1407550 20,21,24,25,26,27,28,3,34,35,4,8
This behaviour I'm looking for is similar to SQL's IN
statement, which works fine. However, pandas
' isin
is not working for me (it's returning entries with only exactly one of the filter classes):
l = [16, 27]
filtered = df.loc[df['Class'].isin(l)]
ID Class
210 30359 16
945 46307 27
3641 81222 27
3817 77365 27
4185 89828 16
How can I filter my DataFrame so that entries with at least one value satisfying the filter condition are returned?
CodePudding user response:
As you have strings, the most optimal is likely to use a regex with word boundaries:
pattern = '|'.join(map(str, l))
out = df[df['Class'].str.contains(fr'\b(?:{pattern})\b')]
Output:
ID Class
0 2431214 16,41,9
1 2497796 14,16,18,20,24,35,41,42
2 1407550 20,21,24,25,26,27,28,3,34,35,4,8
If you want to know which values were detected:
# first match
df['Class'].str.extract(fr'\b({pattern})\b', expand=False)
# all matches
df['Class'].str.extractall(fr'\b({pattern})\b')[0].groupby(level=0).agg(','.join)