Append column value if string is contained in another string-CodePudding

I want to add a new column a3 to my dataframe df: If the strings of "b" contain strings of "b2" from dataframe df2, the new column a3 should append values from a2 of df2.

first dataframe df:


d = {'a': [100, 300], 'b': ["abc", "dfg"]}
df = pd.DataFrame(data=d, index=[1, 2])

print(df)
     a    b
1  100  abc
2  300  dfg

second dataframe df2:


d2 = {'a2': ["L1", "L2", "L3"], 'b2': ["bc", "op", "fg"]}
df2 = pd.DataFrame(data=d2, index=[1, 2, 3])

print(df2)
   a2  b2
1  L1  bc
2  L2  op
3  L3  fg

The output should look like this:

print(df)
     a    b   a3
1  100  abc   L1
2  300  dfg   L3

I tried a nested for loop, which did not work.

for i in df.b:
   for ii in df2.b2:
       for iii in df2.a3:
           if ii in i:
              df["a3"]=iii

CodePudding user response：

You need to test all combinations. You could still take advantage of pandas vector str.contains:

common = (pd.DataFrame({x: df['b'].str.contains(x) for x in df2['b2']})
   .replace({False: pd.NA})
   .stack()
   .reset_index(level=1, name='b2')['level_1'].rename('b2')
)
# 1    bc
# 2    fg
# Name: b2, dtype: object

df.join(common).merge(df2, on='b2')

output:

     a    b  b2  a2
0  100  abc  bc  L1
1  300  dfg  fg  L3

CodePudding user response：

Among a lof of approaches, you can use list comprehension:

df["a2"] = [df2.iloc[i]["a2"] for y in df.b for i,x in enumerate(df2.b2) if x in y]
df

Output

	a	b	a2
1	100	abc	L1
2	300	dfg	L3

And note that, it shouldn't be d2 = {'a2': [10, 30, 25], 'b2': ["bc", "op", "fg"]}, rather it should be d2 = {'a2': ["L1", "L2", "L3"], 'b2': ["bc", "op", "fg"]}.