I have two pandas DataFrames, where the first DataFrame has two columns: "a" and "id" and the second DataFrame has two columns: "id" and "color_value". I would like to compare the ids between the two DataFrames and if there is a match, add a column to the first DataFrame with the correct color_value. In essence, I am using the second DataFrame as a lookup table to find the right ID and match it to its corresponding color value.
See below.
df_1 = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'id': ['001', '002', '001', '004', '003']})
df_2 = pd.DataFrame({'id': ['001', '002', '003', '004'], 'color_value': ['blue', 'red', 'yellow', 'orange']})
so df1:
a id
1 001
2 002
3 001
4 004
5 003
df2:
id color_value
001 blue
002 red
003 yellow
004 orange
I need a way to match up the IDs from df1 and df2 and add the color_value column to df1, so the result is:
df1:
a id color_value
1 001 blue
2 002 red
3 001 blue
4 004 orange
5 003 yellow
CodePudding user response:
You can form a mapping by setting df_2
with id
as index using .set_index()
(acts as keys of the mapping) and take the color_value
column (acts as values of the mapping).
Then, you can map id
in df_1
to the mapping by using .map()
, as follows:
df_1['color_value'] = df_1['id'].map(df_2.set_index('id')['color_value'])
or breaking it down into 2 lines for better readability/ easier understanding, as follows:
mapping = df_2.set_index('id')['color_value']
df_1['color_value'] = df_1['id'].map(mapping)
Result:
print(df_1)
a id color_value
0 1 001 blue
1 2 002 red
2 3 001 blue
3 4 004 orange
4 5 003 yellow
CodePudding user response:
This is exactly what a pandas merge does! There are different types of merges, but the one you're looking for is called a "left" merge. That means that in the final dataframe, all of the rows in your first dataframe will be there, joined with only the rows in your second dataframe that have a matching id.
Imagine that your first dataframe is called the "left" dataframe, and your second one is the "right" dataframe. Your id column acts as the "key," holding values that might match between the two dataframes.
This is how you could use the merge function to accomplish your goal:
merged_df = df_1.merge(df_2, how='left', on='id')
The "how" parameter is where you input what kind of merge you want to do (a left merge). The "on" parameter is where you put the key column name (the id column).
Result:
df_1:
a id
0 1 001
1 2 002
2 3 001
3 4 004
4 5 003
df_2:
color_value id
0 blue 001
1 red 002
2 yellow 003
3 orange 004
merged_df:
a id color_value
0 1 001 blue
1 2 002 red
2 3 001 blue
3 4 004 orange
4 5 003 yellow
For more info:
Docs for the merge function (the examples are helpful): Pandas.DataFrame.merge
And a post with more info about different types of merges: Pandas Merging 101