I have following 2 dataframes -
df1:
job e_msg
0 f_maf_crm_customer_request_global_jgvcc permission denied for relation f_maf_custome
df2:
master_job error_msg current_count
0 JGCC_Genesys_Conversations_Aggregates
1 f_maf_crm_customer_request_global_jgvcc 100
if df1 is having the same job name so df2 should update error_msg with df1.e_msg and current_count should change as 0.
In final output:
master_job error_msg current_count
0 JGCC_Genesys_Conversations_Aggregates
1 f_maf_crm_customer_request_global_jgvcc permission denied for relation f_maf_custome 0
CodePudding user response:
A simple approach would be this one liner.
df2.loc[df2.master_job.isin(df1.job), ['error_msg', 'current_count']] = [df1['e_msg'].values, 0]
This gives the output you want. Please note that I have abbreviated the error_msg
for brevity. You can test it out yourself and see that the output message is the same as the one you are using.
job error_msg current_count
0 JGCC_Genesys_Conversations_Aggregates
1 f_maf_crm_customer_request_global_jgvcc permission denied... 0.0
What that single line of code does is quite simple. It checks if the records in df1
exist in df2
and if it does, then the values in df2
are changed accordingly to match your requirement. You can read more about isin
here.
CodePudding user response:
You can try an outer join on the two dataframes from the merge
function in pandas
. If you don't know what joins are then refer to this webpage.
Code -
merged_df = pd.merge(df1, df2, on = "master_job", how = "outer")
merged_df
Output -
master_job | error_msg_x | current_count | |
---|---|---|---|
0 | JGCC_Genesys_Conversations_Aggregates | nan | |
1 | f_maf_crm_customer_request_global_jgvcc | permission denied for relation f_maf_custome | 100 |
Here we are getting two error_msg
columns because error_msg_x
is from df1
and error_msg_y
is from df2
. Because error_msg_y
isn't of any use to use, we can drop it using the code below.
merged_df = merged_df.drop(columns = "error_msg_y")
We can merge the two dataframes and drop the error_msg_y
column in just one line using this code,
merged_df = pd.merge(df1, df2, on = "master_job", how = "right").drop(columns = "error_msg_y")