Home > Software design >  Search a pandas column in df2 for text contained in each cell of a separate pandas df column (df1),
Search a pandas column in df2 for text contained in each cell of a separate pandas df column (df1),

Time:09-28

Using the sample tables below I'm looking for each cell in df1['A'] to be searched for in df2['name'] (maybe using str.endswith), then for the value in the df2 'Code' cell on the same row of the match to be added to the relevant row in df1

df1:

A
Test - Search
Test - Search2
Another Search
Test - Search

df2:

name Code
blah blah Test - Search AAA100
And another www456
blah blah Search text HGT100
random text Test - Search2 BBB100
Another Search dsw542

Required Result of df1:

A Code
Test - Search AAA100
Test - Search2 BBB100
Another Search dsw542
Test - Search AAA100

Ideally this can be accomplished without iterating through the DataFrame but not sure if this is possible and if it is not possible still looking for the best solution using iteration.

CodePudding user response:

Not sure if this is the best optimal way, as I did with iteration only;

df1 = pd.DataFrame({"A":['Test - Search','Test - Search2','Another Search','Test - Search']})
df2 = pd.DataFrame({"name":['blah blah Test - Search','And another','blah blah Search text','random text Test - Search2','Another Search'],
                    "code":['AAA100','www456','HGT100','BBB100','dsw542']})

code = [] 
for text in df1["A"]:
    df2["temp"] = df2["name"].apply(lambda x: x[-len(text):])
    df3 = df2[df2["temp"] == text]
    if df3.shape[0] > 0:
        code.append(df3.iloc[0,1])
    else:
        code.append("")
df1["code"] = code

# Output of df1

                A    code
0   Test - Search  AAA100
1  Test - Search2  BBB100
2  Another Search  dsw542
3   Test - Search  AAA100

Hope this Helps...

CodePudding user response:

Here you go. Using A extract and dropna to achieve the desired results.

(df2
 .assign(name=lambda x: x.name.str.extract(f"({'|'.join(x for x in sorted(df1.A, key=len, reverse=True))})"))
 .dropna()
)
  • Related