I have two pandas dataframes in python and need to do a measure based on date condition
DF1 has a column named VALUE that I need to divide based on column EXCHANGE from DF2
DF1:
MONTH VALUE
2021-01-01 50
2021-01-01 75
2021-01-01 100
2021-03-01 150
2021-04-01 100
2021-03-01 150
2021-03-01 150
DF2:
MONTH EXCHANGE
2021-01-01 4
2021-02-01 4
2021-03-01 2
2021-04-01 10
I NEED TO TRANSFORM THE DF1 INTO:
MONTH VALUE
2021-01-01 12.5 --> 50/4
2021-01-01 18.75 --> 75 (PREVIOUS VALUE) / 4 (EXCHANGE VALUE)
2021-01-01 25
2021-03-01 75
2021-04-01 10 --> 100/10
2021-03-01 75
2021-03-01 75
Thank you in advance.
CodePudding user response:
Pandas allows you to divide columns by columns. Assuming your DF1
and DF2
have the same length, you could create a new column called 'values'
:
DF1['values'] = DF1['VALUE']/DF2['EXCHANGE']
Output
DF1:
MONTH VALUE values
2021-01-01 50 12,5
2021-01-01 75 18, 75
2021-01-01 100 25
2021-03-01 150 75
2021-04-01 100 ...
2021-03-01 150 ...
2021-03-01 150 ...
Then, you could drop the initial VALUE
column:
DF1 = DF1.drop('VALUE', 1)
Output:
MONTH values
2021-01-01 12.5 --> 50/4
2021-01-01 18.75 --> 75 (PREVIOUS VALUE) / 4 (EXCHANGE VALUE)
2021-01-01 25
2021-03-01 75
2021-04-01 10
2021-03-01 75
2021-03-01 75
Edit:
For the solution asked in the comments to this answer. I would suggest creating a new empty EXCHANGE
column in DF1
.
DF1['EXCHANGE'] = ""
Output:
MONTH VALUE EXCHANGE
2021-01-01 50
2021-01-01 75
2021-01-01 100
2021-03-01 150
2021-04-01 100
2021-03-01 150
2021-03-01 150
And then use the loc method to add values to the existing column-based on the desired date-time condition.
DF1.loc[df1['MONTH'] == '2021-01-01', 'EXCHANGE'] = 4
DF1.loc[df1['MONTH'] == '2021-04-01', 'EXCHANGE'] = 10
Output:
MONTH VALUE EXCHANGE
2021-01-01 50 4
2021-01-01 75 4
2021-01-01 100 4
2021-03-01 150 ?
2021-04-01 100 10
2021-03-01 150 ?
2021-03-01 150 ?
Once you have done that, you can apply the above procedure, just using the two DF1 columns this time. Note: the '?'
in the DF1 EXCHANGE
column is there since I am not sure what value you want for the third month in the month column.
CodePudding user response:
Avoid going into loops and if-else. You can use "Merge" to combine the two DataFrames based on column values and divide as usual:
DF1 = DF1.merge(DF2, on=['MONTH'])
Next divide VALUE by EXCHANGE:
DF1['VALUE'] = DF1['VALUE'].div(DF1['EXCHANGE'])
Finally drop EXCHANGE column from DF1:
DF1.drop(['EXCHANGE'], axis=1, inplace=True)
print(DF1)
Final Output:
MONTH VALUE
1/1/2021 12.50
1/1/2021 18.75
1/1/2021 25.00
3/1/2021 75.00
3/1/2021 75.00
3/1/2021 75.00
4/1/2021 10.00
CodePudding user response:
If you don't care about retaining the original order of DF1
then
result = DF1.set_index('MONTH')['VALUE'].div(DF2.set_index('MONTH')['EXCHANGE']).dropna().to_frame('VALUE')
VALUE
MONTH
2021-01-01 12.50
2021-01-01 18.75
2021-01-01 25.00
2021-03-01 75.00
2021-03-01 75.00
2021-03-01 75.00
2021-04-01 10.00
Otherwise, saad_saeed's answer should work fine, optionally dropping the EXCHANGE
column.
result = DF1.merge(DF2, how='left', on='MONTH')
result['VALUE'] = result['VALUE'].div(result['EXCHANGE'])
MONTH VALUE EXCHANGE
0 2021-01-01 12.50 4
1 2021-01-01 18.75 4
2 2021-01-01 25.00 4
3 2021-03-01 75.00 2
4 2021-04-01 10.00 10
5 2021-03-01 75.00 2
6 2021-03-01 75.00 2
...or this one liner which also retains the original order
DF1['VALUE'] = DF1.merge(DF2, how='left', on='MONTH').apply(lambda x: x['VALUE'] / x['EXCHANGE'], axis=1)
MONTH VALUE
0 2021-01-01 12.50
1 2021-01-01 18.75
2 2021-01-01 25.00
3 2021-03-01 75.00
4 2021-04-01 10.00
5 2021-03-01 75.00
6 2021-03-01 75.00