Home > Software engineering >  Update dataframe column to another dataframe's column on condition
Update dataframe column to another dataframe's column on condition

Time:10-14

Let's say I have 2 dataframes:

#DF1

   1  2  3  4  5
0  A  B  C  D  E
1  D  A  B  K  J
2  B  D  A  A  A
3  C  A  B  K  J

and

#DF2

  ID  SUB_VALUE
0  B  REPLACE_X
1  C  REPLACE_Y

I want to update all the record values in column 1 (DF1), if that value is present in the ID column of DF2. The updated value should be the corresponding SUB_VALUE of DF2.

So the results should look like this:

           1  2  3  4  5
0          A  B  C  D  E
1          D  A  B  K  J
2  REPLACE_X  D  A  A  A
3  REPLACE_y  A  B  K  J

(Because B and C from column 1 are present in the second df, and the corresponding sub_values are REPLACE_X AND REPLACE_Y.)

What I tried:

#creating datasets
keys = {"B": "REPLACE_X",
        "C": "REPLACE_Y"}
helper_df = pd.DataFrame(keys.items(), columns=["ID", "SUB_VALUE"])

df_lst = [["A", "B", "C", "D", "E"],
          ["D", "A", "B", "K", "J"],
          ["B", "D", "A", "A", "A"],
          ["C", "A", "B", "K", "J"]]
df = pd.DataFrame(df_lst,
                  columns = ["1", "2", "3", "4", "5"])

#creating mask for filtering
m = (df["1"].isin(helper_df["ID"].tolist()))
df.loc[m, "1"] = #and here is where I have no idea how to find the corresponding value

CodePudding user response:

using mask, to update only if the value is present in the helper_df

df['1'] = df['1'].mask(
    (df['1'].map(helper_df.set_index(['ID'])['SUB_VALUE'])).notna(),
    (df['1'].map(helper_df.set_index(['ID'])['SUB_VALUE']))
)
df

1   2   3   4   5
0   A   B   C   D   E
1   D   A   B   K   J
2   REPLACE_X   D   A   A   A
3   REPLACE_Y   A   B   K   J

CodePudding user response:

If you have a dataframe the following line should make the job

df["1"] = df["1"].replace(helper_df.set_index('ID').to_dict()["SUB_VALUE"])

output


    1   2   3   4   5
0   A   B   C   D   E
1   D   A   B   K   J
2   REPLACE_X   D   A   A   A
3   REPLACE_Y   A   B   K   J

if you have your values already in a dcitionary you can directly do

keys = {"B": "REPLACE_X",
        "C": "REPLACE_Y"}
df["1"] = df["1"].replace(keys)
  • Related