Home > database >  Replace values of a dataframe using another dataframe with different size
Replace values of a dataframe using another dataframe with different size

Time:09-28

I have a dataframe (df1) that contains information about project_id, cost_center and other features:

project_id cost_center month
101 8575 3
321 8597 4
321 8597 2
Nan 8522 1

Sometimes the project_id is not included there, so is Nan, and for these cases I have a "mapping table" (df2) that indicates the project_id that should be associated to that cost center:

project_id cost_center
832 8522

So in my example, I should be able to replace the Nan in df1 for a 832. It means, I should replace the project_id in df1, when the cost_center is in df2.

I tried the following code, but is not working. It says "Length of values (0) does not match length of index (565)" I think because df1 and df2 have different sizes

df['project_id'] = df_mapping[df['cost_center'].isin(df_mapping['cost_center'])]['project_id'].values

CodePudding user response:

You can use a mapping series and fillna:

df1['project_id'] = (df1['project_id']
                     .fillna(df1['cost_center'].map(df2.set_index('cost_center')['project_id']),
                             downcast='infer'
                            )
                    )

output:

   project_id  cost_center  month
0         101         8575      3
1         321         8597      4
2         321         8597      2
3         832         8522      1

CodePudding user response:

one way to do this is to merge the 2 DFs and then use fillna() to create a new (output) column. Hope it helps!

df_1 = pd.DataFrame({"id":[1,2,3,None],"center":[5,6,7,8]},index=["a","b","c","d"])
df_2 = pd.DataFrame({"id":[4],"center":[8]},index=["g"])
df_merge = df_1.merge(df_2,on="center",how="outer")
df_merge["id_output"] = df_merge["id_x"].fillna(df_merge["id_y"])
df_merge.drop(["id_x","id_y"],inplace=True, axis=1)
  • Related