I am looking up the prices of products based on id_number
in another DataFrame called df
and adding the prices of each product in my df_pair
DataFrame. I am doing this right now in the following way:
df_pair['price_p1'] = df_pair['p1'].progress_apply(lambda x: df[df['id_number'] == x]['price'].iloc[0])
But it seems, it's too slow. Is there any other better solution to solve this?
My two DataFrame sample:
df
:
id_number | price | ... |
---|---|---|
B0000D9MYM | 12.5 | |
B0000D9MYL | 25.26 | |
B00ADHIGBA | 8.9 | |
B00H9OX598 | 10.56 | |
... | .... |
df_pair
looks like below:
p1 | ... |
---|---|
B002HQCWYM | |
B00H9OX598 | |
B0000D9MYL | |
... |
Note: the indices or the order of the id_numbers
and p1
values are not the same.
CodePudding user response:
df_pair.merge(df.rename(columns={"id_number":"p1"}), how="left")
you can merge 2 dataframes by merge func but need same column name. (or use left_on & right_on)
my eng is not good. so hard to exlain long. if you have more question. tell me.
if df has many columns, slice df and use df[["id_number", "price"]] instead of df
and how can i make grey block on df? i don want make bold anymore
CodePudding user response:
Assuming id_number
is the index of df
you can do something like:
df_pair.join(df, on='p1')['price'].sum()
A join is a standard database operation that basically merges two data frames by linking a column of one to the index of other.
CodePudding user response:
Another option is to set_index
with "id_number" and map
the price
:
df_pair['price'] = df_pair['p1'].map(df1.set_index('id_number')['price'])
Output:
p1 ... price
0 B002HQCWYM ... NaN
1 B00H9OX598 ... 10.56
2 B0000D9MYL ... 25.26
3 ...