I have two dataframes that are as follows:
Dataframe df1
:
Rep
0 ec21b_AI_154OH
1 m2010_AI_066UW
2 20wh1_DS_416FC
Dataframe df2
:
Address FirstPart SecondPart
0 address13 m2010 066UW
1 address22 2020e 999GV
2 address26 2020c 513DT
3 address35 evd18 874GO
4 address36 ep21b 986CG
5 address493 20wh1 416FC
6 address628 ec21b 154OH
I want to add an Address
column in dataframe df1
using pandas, so that it looks as follows:
Rep Address
0 ec21b_AI_154OH address628
1 m2010_AI_066UW address13
2 20wh1_DS_416FC address493
For every row in df2
, I can search for matches in df1
, and put the address.
However, is there a better way to do it?
My minimum working example is as follows:
for First, Second, Add in zip(list(df2['FirstPart']),list(df2['SecondPart']),list(df2['Address'])):
condition = df1['Rep'].str.contains(First) & df1['Rep'].str.contains(Second)
df1.loc[condition,"Address"] = Add
Note: It is not necessary to find _
as a delimiter, and there can be other delimiters as well.
CodePudding user response:
IIUC, you can extract
the two part to use for a merge
(here with named capturing groups, although optional if you use the left_on
/right_on
parameters or merge
):
out = df1.join(
df1['Rep']
.str.extract(r'^(?P<FirstPart>[^_] ).*?(?P<SecondPart>[^_] )$')
.merge(df2, how='left')['Address']
)
output:
Rep Address
0 ec21b_AI_154OH address628
1 m2010_AI_066UW address13
2 20wh1_DS_416FC address493
CodePudding user response:
you can just loop trought the both columns:
Those are you dfs, on the first df just add the new column Address:
import pandas as pd
df_dict_1 = {
'Rep':['ec21b_AI_154OH', 'm2010_AI_066UW', '20wh1_DS_416FC'],
'Address': [None, None, None]
}
df_dict_2 = {
'Address':['address13', 'address22', 'address26', 'address35', 'address36',\
'address493', 'address628' ],
'FirstPart':['m2010', '2020e', '2020c', 'evd18', 'ep21b',\
'20wh1', 'ec21b' ],
'SecondPart':['066UW', '999GV', '513DT', '874GO', '986CG',\
'416FC', '154OH' ],
}
df1 = pd.DataFrame(df_dict_1)
df2 = pd.DataFrame(df_dict_2)
And loop trought the second one:
rep_list = list(df1.Rep)
for x in range(df2.shape[0]):
first_part = df2.FirstPart.iloc[x]
second_part = df2.SecondPart.iloc[x]
rep = ''.join([x for x in rep_list if first_part in x and second_part in x])
adress = df2.Address.iloc[x]
for x in range(len(df1.Rep)):
if rep ==df1.Rep[x]:
df1.Address.iloc[x] = adress
Output:
Rep Address
0 ec21b_AI_154OH address628
1 m2010_AI_066UW address13
2 20wh1_DS_416FC address493