Home > Enterprise >  Filter DataFrame rows with at least one value in list
Filter DataFrame rows with at least one value in list

Time:01-06

I have a pandas DataFrame structured as follows:

ID                                            Class
0  2431214                                   16,41,9
1  2497796                   14,16,18,20,24,35,41,42
2  1407550          20,21,24,25,26,27,28,3,34,35,4,8
3   472723                                   1,17,22
4   423046                     28,30,32,34,39,4,42,8

Class is a string attribute. I want to filter rows where Class contains at least one of the elements in a given list, for example, if my filter is l = [16, 27], I should get the following:

ID                                            Class
0  2431214                                   16,41,9
1  2497796                   14,16,18,20,24,35,41,42
2  1407550          20,21,24,25,26,27,28,3,34,35,4,8

This behaviour I'm looking for is similar to SQL's IN statement, which works fine. However, pandas' isin is not working for me (it's returning entries with only exactly one of the filter classes):

l = [16, 27]
filtered = df.loc[df['Class'].isin(l)]
ID               Class
210   30359        16
945   46307        27
3641  81222        27
3817  77365        27
4185  89828        16

How can I filter my DataFrame so that entries with at least one value satisfying the filter condition are returned?

CodePudding user response:

As you have strings, the most optimal is likely to use a regex with word boundaries:

pattern = '|'.join(map(str, l))

out = df[df['Class'].str.contains(fr'\b(?:{pattern})\b')]

Output:

        ID                             Class
0  2431214                           16,41,9
1  2497796           14,16,18,20,24,35,41,42
2  1407550  20,21,24,25,26,27,28,3,34,35,4,8

If you want to know which values were detected:

# first match
df['Class'].str.extract(fr'\b({pattern})\b', expand=False)
# all matches
df['Class'].str.extractall(fr'\b({pattern})\b')[0].groupby(level=0).agg(','.join)
  • Related