Home > Enterprise >  How to update the data frame column values from another data frame based a conditional match in pand
How to update the data frame column values from another data frame based a conditional match in pand

Time:07-25

I have two dataframes as:

df_A:

{'last_name': {0: 'Williams', 1: 'Henry', 2: 'XYX', 3: 'Smith', 4: 'David', 5: 'Freeman', 6: 'Walter', 7: 'Test_A', 8: 'Mallesham', 9: 'Mallesham', 10: 'Henry', 11: 'Smith'}, 'first_name': {0: 'Henry', 1: 'Williams', 2: 'ABC', 3: 'David', 4: 'Smith', 5: 'Walter', 6: 'Freeman', 7: 'Test_B', 8: 'Yamulla', 9: 'Yamulla', 10: 'Williams', 11: 'David'}, 'full_name': {0: 'Williams Henry', 1: 'Henry Williams', 2: 'XYX ABC', 3: 'Smith David', 4: 'David Smith', 5: 'Freeman Walter', 6: 'Walter Freeman', 7: 'Test_A Test_B', 8: 'Mallesham Yamulla', 9: 'Mallesham Yamulla', 10: 'Henry Williams', 11: 'Smith David'}, 'name_unique_identifier': {0: 'NAME_GROUP-11', 1: 'NAME_GROUP-11', 2: 'NAME_GROUP-12', 3: 'NAME_GROUP-13', 4: 'NAME_GROUP-13', 5: 'NAME_GROUP-14', 6: 'NAME_GROUP-14', 7: 'NAME_GROUP-15', 8: 'NAME_GROUP-16', 9: 'NAME_GROUP-16', 10: 'NAME_GROUP-11', 11: 'NAME_GROUP-13'}}
    last_name first_name          full_name name_unique_identifier
0    Williams      Henry     Williams Henry          NAME_GROUP-11
1       Henry   Williams     Henry Williams          NAME_GROUP-11
2         XYX        ABC            XYX ABC          NAME_GROUP-12
3       Smith      David        Smith David          NAME_GROUP-13
4       David      Smith        David Smith          NAME_GROUP-13
5     Freeman     Walter     Freeman Walter          NAME_GROUP-14
6      Walter    Freeman     Walter Freeman          NAME_GROUP-14
7      Test_A     Test_B      Test_A Test_B          NAME_GROUP-15
8   Mallesham    Yamulla  Mallesham Yamulla          NAME_GROUP-16
9   Mallesham    Yamulla  Mallesham Yamulla          NAME_GROUP-16
10      Henry   Williams     Henry Williams          NAME_GROUP-11
11      Smith      David        Smith David          NAME_GROUP-13

df_B:

{'name_unique_identifier': {0: 'NAME_GROUP-11', 1: 'NAME_GROUP-13', 2: 'NAME_GROUP-14'}, 'full_name': {0: 'Henry Williams', 1: 'Smith David', 2: 'Freeman Walter'}, 'last_name': {0: 'Henry', 1: 'Smith', 2: 'Freeman'}, 'first_name': {0: 'Williams', 1: 'David', 2: 'Walter'}}
  name_unique_identifier       full_name last_name first_name
0          NAME_GROUP-11  Henry Williams     Henry   Williams
1          NAME_GROUP-13     Smith David     Smith      David
2          NAME_GROUP-14  Freeman Walter   Freeman     Walter

Here wherever the name_unique_identifier exists in df_A and df_B, df_A dataframe column's last_name,first_name to be filled in with df_B last_name,first_name, the non matched entries not required to be updated.

Example:

NAME_GROUP-14 exists in df_A and df_B. So last_name and first_name in df_A for this identifier should be as 'Freeman','Walter'.

As I'm dealing with millions of records, an efficient technique is needed.

CodePudding user response:

You can check each unique value in column=name_unique_identifier from df_B where exist in df_A and then insert the value from df_B to df_A.

col = 'name_unique_identifier'
for val in df_B[col]:
    msk_A = df_A[col].eq(val)
    msk_B = df_B[col].eq(val)
    df_A.loc[msk, ['last_name', 'first_name']] = df_B.loc[msk_B, ['last_name', 'first_name']].values

# If you want to update 'full_name' base new values of 'last_name' and 'first_name'
df_A['full_name'] = df_A['last_name']   " "   df_A['first_name']
print(df_A)

    last_name first_name          full_name name_unique_identifier
0    Williams      Henry     Williams Henry          NAME_GROUP-11
1       Henry   Williams     Henry Williams          NAME_GROUP-11
2         XYX        ABC            XYX ABC          NAME_GROUP-12
3       Smith      David        Smith David          NAME_GROUP-13
4       David      Smith        David Smith          NAME_GROUP-13
5     Freeman     Walter     Freeman Walter          NAME_GROUP-14
6     Freeman     Walter     Freeman Walter          NAME_GROUP-14
7      Test_A     Test_B      Test_A Test_B          NAME_GROUP-15
8   Mallesham    Yamulla  Mallesham Yamulla          NAME_GROUP-16
9   Mallesham    Yamulla  Mallesham Yamulla          NAME_GROUP-16
10      Henry   Williams     Henry Williams          NAME_GROUP-11
11      Smith      David        Smith David          NAME_GROUP-13

CodePudding user response:

This pandas solution might work for you:

df_A = pd.DataFrame({'last_name': {0: 'Williams', 1: 'Henry', 2: 'XYX', 3: 'Smith', 4: 'David', 5: 'Freeman', 6: 'Walter', 7: 'Test_A', 8: 'Mallesham', 9: 'Mallesham', 10: 'Henry', 11: 'Smith'}, 'first_name': {0: 'Henry', 1: 'Williams', 2: 'ABC', 3: 'David', 4: 'Smith', 5: 'Walter', 6: 'Freeman', 7: 'Test_B', 8: 'Yamulla', 9: 'Yamulla', 10: 'Williams', 11: 'David'}, 'full_name': {0: 'Williams Henry', 1: 'Henry Williams', 2: 'XYX ABC', 3: 'Smith David', 4: 'David Smith', 5: 'Freeman Walter', 6: 'Walter Freeman', 7: 'Test_A Test_B', 8: 'Mallesham Yamulla', 9: 'Mallesham Yamulla', 10: 'Henry Williams', 11: 'Smith David'}, 'name_unique_identifier': {0: 'NAME_GROUP-11', 1: 'NAME_GROUP-11', 2: 'NAME_GROUP-12', 3: 'NAME_GROUP-13', 4: 'NAME_GROUP-13', 5: 'NAME_GROUP-14', 6: 'NAME_GROUP-14', 7: 'NAME_GROUP-15', 8: 'NAME_GROUP-16', 9: 'NAME_GROUP-16', 10: 'NAME_GROUP-11', 11: 'NAME_GROUP-13'}})
df_B = pd.DataFrame({'name_unique_identifier': {0: 'NAME_GROUP-11', 1: 'NAME_GROUP-13', 2: 'NAME_GROUP-14'}, 'full_name': {0: 'Henry Williams', 1: 'Smith David', 2: 'Freeman Walter'}, 'last_name': {0: 'Henry', 1: 'Smith', 2: 'Freeman'}, 'first_name': {0: 'Williams', 1: 'David', 2: 'Walter'}})
df_A.update(pd.merge(df_A, df_B, how='left', on='name_unique_identifier').drop(['last_name_x', 'first_name_x', 'full_name_x'], axis=1).rename(columns={'full_name_y':'full_name', 'last_name_y':'last_name', 'first_name_y':'first_name'}))
print(df_A)

Result:

    last_name first_name          full_name name_unique_identifier
0       Henry   Williams     Henry Williams          NAME_GROUP-11
1       Henry   Williams     Henry Williams          NAME_GROUP-11
2         XYX        ABC            XYX ABC          NAME_GROUP-12
3       Smith      David        Smith David          NAME_GROUP-13
4       Smith      David        Smith David          NAME_GROUP-13
5     Freeman     Walter     Freeman Walter          NAME_GROUP-14
6     Freeman     Walter     Freeman Walter          NAME_GROUP-14
7      Test_A     Test_B      Test_A Test_B          NAME_GROUP-15
8   Mallesham    Yamulla  Mallesham Yamulla          NAME_GROUP-16
9   Mallesham    Yamulla  Mallesham Yamulla          NAME_GROUP-16
10      Henry   Williams     Henry Williams          NAME_GROUP-11
11      Smith      David        Smith David          NAME_GROUP-13
  • Related