Home > Blockchain >  Filter dataframe using a list of strings
Filter dataframe using a list of strings

Time:10-05

I need to filter a df with multiple rows and columns with a list.

The list looks like this, for example:

list = ['asd1','#asd2','asd3','#asd4','asd5'] 

Column to filter:

description
Hi this is the description #asd2 i need to filter this row of the dataframe
lalala

The column that I want to filter by this list has all kinds of text, so a "isin" or "contains" should be involved, but the returned values should still be in a dataframe format including all the rest of the columns and rows, but filtered by the rows that included those substrings over one column.

Any help is appreciated. Regards

CodePudding user response:

You can combine the strings in the list by | and create a regex pattern out of it, then you can pass this to Series.str.contains method and use the resulting Boolean as indexing for the dataframe:

import re
pattern = re.compile('|'.join(lst))  # lst is the list of strings
out = df.loc[df['description'].str.contains(pattern)]

# out
                                         description
0  Hi this is the description #asd2 i need to fil...

CodePudding user response:

Never call a list "list"

listt = ['asd1','#asd2','asd3','#asd4','asd5'] 
df = pd.DataFrame({'a':['asd1 lol xls','1 2 asd5','abc def fgh ijk']})

df['isin'] = df.apply(lambda x: [s for s in listt if s in x['a']], axis=1)

after that you can keep only df rows that have non-0 lists in column 'isin'

Hope it helped

  • Related