I have 2 dataframes of different sizes. The first dataframe(df1
) has 4 columns, but two of those columns have the same name as the columns in the second dataframe(df2
), which is only comprised of 2 columns. The columns in common are ['ID']
and ['Department']
.
I want to check if any ID
from df2
are in df1
. If so, I want to replace df1['Department']
value with df2['Department']
value.
For instance, df1
looks something like this:
ID Department Yrs Experience Education
1234 Science 1 Bachelors
2356 Art 3 Bachelors
2456 Math 2 Masters
4657 Science 4 Masters
And df2
looks something like this:
ID Department
1098 P.E.
1234 Technology
2356 History
I want to check if the ID
from df2
is in df1
and if so, update Department
. The output should looks something like this:
ID Department Yrs Experience Education
1234 **Technology** 1 Bachelors
2356 **History** 3 Bachelors
2456 Math 2 Masters
4657 Science 4 Masters
The expected updates to df1
are in bold
Is there an efficient way to do this?
Thank you for taking the time to read this and help.
CodePudding user response:
You can use ID
of df1
to map with the Pandas series formed by setting ID
on df2
as index and taking the column of Department
from df2
(this acts as a mapping table).
Then, in case of no match of ID
from df2
, we fill-in the original values of Department
from df1
(to retain original values in case of no match):
df1['Department'] = (df1['ID'].map(df2.set_index('ID')['Department'])
.fillna(df1['Department'])
)
Result:
print(df1)
ID Department Yrs Experience Education
0 1234 Technology 1 Bachelors
1 2356 History 3 Bachelors
2 2456 Math 2 Masters
3 4657 Science 4 Masters
CodePudding user response:
Try:
df1["Department"].update(
df1[["ID"]].merge(df2, on="ID", how="left")["Department"]
)
print(df1)
Prints:
ID Department Yrs Experience Education
0 1234 Technology 1 Bachelors
1 2356 History 3 Bachelors
2 2456 Math 2 Masters
3 4657 Science 4 Masters
CodePudding user response:
df_1 = pd.DataFrame(data={'ID':[1234, 2356, 2456, 4657], 'Department':['Science', 'Art', 'Math', 'Science']})
df_2 = pd.DataFrame(data={'ID':[1234, 2356], 'Department':['Technology', 'History']})
df_1.loc[df_1['ID'].isin(df_2['ID']), 'Department'] = df_2['Department']
OutPut
ID Department
0 1234 Technology
1 2356 History
2 2456 Math
3 4657 Science