I need to filter a df with multiple rows and columns with a list.
The list looks like this, for example:
list = ['asd1','#asd2','asd3','#asd4','asd5']
Column to filter:
description |
---|
Hi this is the description #asd2 i need to filter this row of the dataframe |
lalala |
The column that I want to filter by this list has all kinds of text, so a "isin" or "contains" should be involved, but the returned values should still be in a dataframe format including all the rest of the columns and rows, but filtered by the rows that included those substrings over one column.
Any help is appreciated. Regards
CodePudding user response:
You can combine the strings in the list by |
and create a regex pattern out of it, then you can pass this to Series.str.contains
method and use the resulting Boolean as indexing for the dataframe:
import re
pattern = re.compile('|'.join(lst)) # lst is the list of strings
out = df.loc[df['description'].str.contains(pattern)]
# out
description
0 Hi this is the description #asd2 i need to fil...
CodePudding user response:
Never call a list
"list"
listt = ['asd1','#asd2','asd3','#asd4','asd5']
df = pd.DataFrame({'a':['asd1 lol xls','1 2 asd5','abc def fgh ijk']})
df['isin'] = df.apply(lambda x: [s for s in listt if s in x['a']], axis=1)
after that you can keep only df rows that have non-0 lists in column 'isin'
Hope it helped