I want to add a new column a3 to my dataframe df: If the strings of "b" contain strings of "b2" from dataframe df2, the new column a3 should append values from a2 of df2.
first dataframe df:
d = {'a': [100, 300], 'b': ["abc", "dfg"]}
df = pd.DataFrame(data=d, index=[1, 2])
print(df)
a b
1 100 abc
2 300 dfg
second dataframe df2:
d2 = {'a2': ["L1", "L2", "L3"], 'b2': ["bc", "op", "fg"]}
df2 = pd.DataFrame(data=d2, index=[1, 2, 3])
print(df2)
a2 b2
1 L1 bc
2 L2 op
3 L3 fg
The output should look like this:
print(df)
a b a3
1 100 abc L1
2 300 dfg L3
I tried a nested for loop, which did not work.
for i in df.b:
for ii in df2.b2:
for iii in df2.a3:
if ii in i:
df["a3"]=iii
CodePudding user response:
You need to test all combinations. You could still take advantage of pandas vector str.contains
:
common = (pd.DataFrame({x: df['b'].str.contains(x) for x in df2['b2']})
.replace({False: pd.NA})
.stack()
.reset_index(level=1, name='b2')['level_1'].rename('b2')
)
# 1 bc
# 2 fg
# Name: b2, dtype: object
df.join(common).merge(df2, on='b2')
output:
a b b2 a2
0 100 abc bc L1
1 300 dfg fg L3
CodePudding user response:
Among a lof of approaches, you can use list comprehension:
df["a2"] = [df2.iloc[i]["a2"] for y in df.b for i,x in enumerate(df2.b2) if x in y]
df
Output
a | b | a2 | |
---|---|---|---|
1 | 100 | abc | L1 |
2 | 300 | dfg | L3 |
And note that, it shouldn't be d2 = {'a2': [10, 30, 25], 'b2': ["bc", "op", "fg"]}
, rather it should be d2 = {'a2': ["L1", "L2", "L3"], 'b2': ["bc", "op", "fg"]}
.