Let's say I have 2 dataframes:
#DF1
1 2 3 4 5
0 A B C D E
1 D A B K J
2 B D A A A
3 C A B K J
and
#DF2
ID SUB_VALUE
0 B REPLACE_X
1 C REPLACE_Y
I want to update all the record values in column 1 (DF1), if that value is present in the ID column of DF2. The updated value should be the corresponding SUB_VALUE of DF2.
So the results should look like this:
1 2 3 4 5
0 A B C D E
1 D A B K J
2 REPLACE_X D A A A
3 REPLACE_y A B K J
(Because B and C from column 1 are present in the second df, and the corresponding sub_values are REPLACE_X AND REPLACE_Y.)
What I tried:
#creating datasets
keys = {"B": "REPLACE_X",
"C": "REPLACE_Y"}
helper_df = pd.DataFrame(keys.items(), columns=["ID", "SUB_VALUE"])
df_lst = [["A", "B", "C", "D", "E"],
["D", "A", "B", "K", "J"],
["B", "D", "A", "A", "A"],
["C", "A", "B", "K", "J"]]
df = pd.DataFrame(df_lst,
columns = ["1", "2", "3", "4", "5"])
#creating mask for filtering
m = (df["1"].isin(helper_df["ID"].tolist()))
df.loc[m, "1"] = #and here is where I have no idea how to find the corresponding value
CodePudding user response:
using mask, to update only if the value is present in the helper_df
df['1'] = df['1'].mask(
(df['1'].map(helper_df.set_index(['ID'])['SUB_VALUE'])).notna(),
(df['1'].map(helper_df.set_index(['ID'])['SUB_VALUE']))
)
df
1 2 3 4 5
0 A B C D E
1 D A B K J
2 REPLACE_X D A A A
3 REPLACE_Y A B K J
CodePudding user response:
If you have a dataframe the following line should make the job
df["1"] = df["1"].replace(helper_df.set_index('ID').to_dict()["SUB_VALUE"])
output
1 2 3 4 5
0 A B C D E
1 D A B K J
2 REPLACE_X D A A A
3 REPLACE_Y A B K J
if you have your values already in a dcitionary you can directly do
keys = {"B": "REPLACE_X",
"C": "REPLACE_Y"}
df["1"] = df["1"].replace(keys)