I have a list of letters:
letters = ['E', 'H', 'T', 'D']
I have a dataframe with the following rows:
letter_1 letter_2 letter_3 letter_4 letter_5 word
0 D E B U T DEBUT
1 D E B U G DEBUG
2 B E G E T BEGET
3 D E P T H DEPTH
4 D U V E T DUVET
I am trying to filter out all rows that do not contain ALL of the items in the letters list.
CodePudding user response:
You can use set operations:
df[df.filter(like='letter').agg(set, axis=1) >= set(letters)]
or using the "word":
df[df['word'].agg(set) >= set(letters)]
output:
letter_1 letter_2 letter_3 letter_4 letter_5 word
3 D E P T H DEPTH
CodePudding user response:
Another approach using numpy and broadcasting (this performs all comparisons and ensure there is at least 1 match for each letter):
m = (df.filter(like='letter').to_numpy()==np.array(letters)[:,None,None]
).any(2).all(0)
df[m]
output:
letter_1 letter_2 letter_3 letter_4 letter_5 word
3 D E P T H DEPTH
CodePudding user response:
Another option is to use numpy.in1d
df[df.word.apply(lambda x: np.in1d(letters, list(x)).all())]
letter_1 letter_2 letter_3 letter_4 letter_5 word
3 D E P T H DEPTH
CodePudding user response:
Another method:
df[df['word'].apply(lambda x: all(s in x for s in letters))]
index | letter_1 | letter_2 | letter_3 | letter_4 | letter_5 | word |
---|---|---|---|---|---|---|
3 | D | E | P | T | H | DEPTH |