I have a dataframe (df1) that contains information about project_id, cost_center and other features:
project_id | cost_center | month |
---|---|---|
101 | 8575 | 3 |
321 | 8597 | 4 |
321 | 8597 | 2 |
Nan | 8522 | 1 |
Sometimes the project_id is not included there, so is Nan, and for these cases I have a "mapping table" (df2) that indicates the project_id that should be associated to that cost center:
project_id | cost_center |
---|---|
832 | 8522 |
So in my example, I should be able to replace the Nan in df1 for a 832. It means, I should replace the project_id in df1, when the cost_center is in df2.
I tried the following code, but is not working. It says "Length of values (0) does not match length of index (565)" I think because df1 and df2 have different sizes
df['project_id'] = df_mapping[df['cost_center'].isin(df_mapping['cost_center'])]['project_id'].values
CodePudding user response:
You can use a mapping series and fillna
:
df1['project_id'] = (df1['project_id']
.fillna(df1['cost_center'].map(df2.set_index('cost_center')['project_id']),
downcast='infer'
)
)
output:
project_id cost_center month
0 101 8575 3
1 321 8597 4
2 321 8597 2
3 832 8522 1
CodePudding user response:
one way to do this is to merge the 2 DFs and then use fillna() to create a new (output) column. Hope it helps!
df_1 = pd.DataFrame({"id":[1,2,3,None],"center":[5,6,7,8]},index=["a","b","c","d"])
df_2 = pd.DataFrame({"id":[4],"center":[8]},index=["g"])
df_merge = df_1.merge(df_2,on="center",how="outer")
df_merge["id_output"] = df_merge["id_x"].fillna(df_merge["id_y"])
df_merge.drop(["id_x","id_y"],inplace=True, axis=1)