My df1 looks like this:It contains 3 unique project id
.The date
starts on 01-01-22
and ends on 01-12-28
id | date | p50 | p90 |
---|---|---|---|
apv1 | 01-01-22 | 1000 | 1000 |
apv2 | 01-01-22 | 1000 | 1000 |
tsso | 01-01-22 | 1202 | 2005 |
apv1 | 01-02-22 | 1000 | 2000 |
apv2 | 01-02-22 | 1400 | 5000 |
tsso | 01-02-22 | 200 | 1000 |
. | . | . | . |
. | . | . | . |
. | . | . | . |
apv1 | 01-11-28 | 305 | 400 |
apv2 | 01-11-28 | 300 | 200 |
tsso | 01-11-28 | 250 | 499 |
apv1 | 01-12-28 | 100 | 290 |
apv2 | 01-12-28 | 145 | 650 |
tsso | 01-12-28 | 299 | 179 |
df2 looks like this:
mth | month | pct |
---|---|---|
01 | january | 0.042 |
02 | febuary | 0.055 |
03 | march | 0.089 |
04 | april | 0.097 |
05 | may | 0.10 |
06 | june | 0.11 |
07 | july | 0.127 |
08 | august | 0.114 |
09 | sept | 0.091 |
10 | oct | 0.068 |
11 | nov | 0.043 |
12 | dec | 0.038 |
I want to multiply each value p50
of df1 by the value pct
of df2 with mth
or month
value equals to the month of the date
in df1. Similarly for p90
.
The Final df should look like this:
id | date | p50 | p90 |
---|---|---|---|
apv1 | 01-01-22 | 1000*0.042 | 1000*0.042 |
apv2 | 01-01-22 | 1000*0.042 | 1000*0.042 |
tsso | 01-01-22 | 1202*0.042 | 2005*0.042 |
apv1 | 01-02-22 | 1000*0.055 | 2000*0.055 |
apv2 | 01-02-22 | 1400*0.055 | 5000*0.055 |
tsso | 01-02-22 | 200*0.055 | 1000*0.055 |
. | . | . | . |
. | . | . | . |
. | . | . | . |
apv1 | 01-11-28 | 305*0.043 | 400*0.043 |
apv2 | 01-11-28 | 300*0.043 | 200*0.043 |
tsso | 01-11-28 | 250*0.043 | 499*0.043 |
apv1 | 01-12-28 | 100*0.038 | 290*0.038 |
apv2 | 01-12-28 | 145*0.038 | 650*0.038 |
tsso | 01-12-28 | 299*0.038 | 179*0.038 |
CodePudding user response:
Simply assign
month column using the date column. Then merge
both data frames by month. Finally, run multiplication with needed columns:
final_df = (
df1.assign(mth = lambda x: pd.to_datetime(x["date"]).dt.strftime("%m"))
.merge(df2, on="mth")
.assign(
p50 = lambda x: x["p50"].mul(x["pct"]),
p90 = lambda x: x["p90"].mul(x["pct"])
)
)
CodePudding user response:
You should be able to use map
to modify you data in place:
s = df2.set_index('mth')['pct']
pct = pd.to_datetime(df1['date']).dt.strftime('%m').map(s)
df1[['p50', 'p90']] *= pct