I want to keep the rows of dataframe df if the strings of "b" contain strings of "b2" from dataframe df2
import pandas as pd
d = {'a': [100, 125, 300, 235], 'b': ["abc","ghf" "dfg", "hij"]}
df = pd.DataFrame(data=d, index=[1, 2, 3, 4])
print(df)
a b
1 100 abc
2 125 ghf
3 300 dfg
4 235 hij
d2 = {'a2': [10, 25, 30], 'b2': ["bc", "fg", "op"]}
df2 = pd.DataFrame(data=d2, index=[1, 2, 3])
print(df2)
a2 b2
1 10 bc
2 25 fg
3 30 op
The output should look like this:
a b
1 100 abc
2 300 dfg
I tried the following but it did not work.
for majstring in df.b:
for substring in set(df2.b2):
if substring in majstring:
pass
else:
df.drop(df.loc[df['b'] == majstring], inplace=True)
CodePudding user response:
Try this:
mask = sum([df['b'].str.contains(v) for v in df2['b2']]).astype(bool)
filtered_df = df[mask]
Output:
>>> filtered_df
a b
1 100 abc
3 300 dfg