How to filter values in columns of df that contains all substring in passed list in Pandas DataFrame-CodePudding

Any suggestion how to stay only values that contains all substring that in list for any column?:

import pandas as pd

df = pd.DataFrame(
    [
    [1, 'foollish', 'molish'], 
    [2, 'barnylishon', 'chacha'], 
    [3, 'bazon', 'gazon'],
    ], 
    columns=['id', 'value_1', 'value_2'])

print (df)
    
search_list = ['a','on']

print ("Desire result for value_1 column:")

df_desire_result = pd.DataFrame(
    [
    [1, 'barnylishon', 'chacha'], 
    [2, 'bazon', 'gazon'], 
    ], 
    columns=['id', 'value_1', 'value_2'])

print (df_desire_result)

CodePudding user response：

From this statement that contains all substring that in list for any column? : I figure that if any column in a row has all the substrings in the search_list then retain that row and drop remaining rows.

Then IIUC:

cols = df.columns.drop('id').tolist()
m = df[cols].apply(lambda x: all([any(x.str.contains(s)) for s in search_list]), axis=1)
out = df[m]

print(out):

   id      value_1 value_2
1   2  barnylishon  chacha
2   3        bazon   gazon

CodePudding user response：

You can use:

# craft regex pattern
import re
pattern = '|'.join(map(re.escape, search_list))
# 'a|on'

out = df.loc[(df
   # extract words from all cells
   .filter(like='value')
   .stack()
   .str.extractall(fr'({pattern})')[0]
   # ensure that each word is present at least once per row
   .groupby(level=0).nunique()
   .eq(len(search_list))
   .reindex(df.index, fill_value=False)
 )]

print(out)

Output:

   id      value_1 value_2
1   2  barnylishon  chacha
2   3        bazon   gazon