Any suggestion how to stay only values that contains all substring that in list for any column?:
import pandas as pd
df = pd.DataFrame(
[
[1, 'foollish', 'molish'],
[2, 'barnylishon', 'chacha'],
[3, 'bazon', 'gazon'],
],
columns=['id', 'value_1', 'value_2'])
print (df)
search_list = ['a','on']
print ("Desire result for value_1 column:")
df_desire_result = pd.DataFrame(
[
[1, 'barnylishon', 'chacha'],
[2, 'bazon', 'gazon'],
],
columns=['id', 'value_1', 'value_2'])
print (df_desire_result)
CodePudding user response:
From this statement that contains all substring that in list for any column? : I figure that if any column in a row has all the substrings in the search_list
then retain that row and drop remaining rows.
Then IIUC:
cols = df.columns.drop('id').tolist()
m = df[cols].apply(lambda x: all([any(x.str.contains(s)) for s in search_list]), axis=1)
out = df[m]
print(out):
id value_1 value_2
1 2 barnylishon chacha
2 3 bazon gazon
CodePudding user response:
You can use:
# craft regex pattern
import re
pattern = '|'.join(map(re.escape, search_list))
# 'a|on'
out = df.loc[(df
# extract words from all cells
.filter(like='value')
.stack()
.str.extractall(fr'({pattern})')[0]
# ensure that each word is present at least once per row
.groupby(level=0).nunique()
.eq(len(search_list))
.reindex(df.index, fill_value=False)
)]
print(out)
Output:
id value_1 value_2
1 2 barnylishon chacha
2 3 bazon gazon