Home > database >  I have two dataframes and I need to compare them to each other and calculate a measure
I have two dataframes and I need to compare them to each other and calculate a measure

Time:12-30

I have two pandas dataframes in python and need to do a measure based on date condition

DF1 has a column named VALUE that I need to divide based on column EXCHANGE from DF2


DF1: 

   MONTH     VALUE
2021-01-01    50
2021-01-01    75
2021-01-01    100
2021-03-01    150
2021-04-01    100
2021-03-01    150
2021-03-01    150 

DF2: 
    MONTH     EXCHANGE
2021-01-01     4
2021-02-01     4
2021-03-01     2
2021-04-01     10

I NEED TO TRANSFORM THE DF1 INTO:

   MONTH               VALUE
2021-01-01              12.5 --> 50/4
2021-01-01              18.75 --> 75 (PREVIOUS VALUE) / 4 (EXCHANGE VALUE)
2021-01-01              25
2021-03-01              75
2021-04-01              10   --> 100/10                     
2021-03-01              75
2021-03-01              75   

Thank you in advance.

CodePudding user response:

Pandas allows you to divide columns by columns. Assuming your DF1 and DF2 have the same length, you could create a new column called 'values':

DF1['values'] = DF1['VALUE']/DF2['EXCHANGE']

Output

DF1: 

   MONTH      VALUE    values
2021-01-01    50       12,5
2021-01-01    75       18, 75
2021-01-01    100      25
2021-03-01    150      75
2021-04-01    100      ...
2021-03-01    150      ...
2021-03-01    150      ...

Then, you could drop the initial VALUE column:

DF1 = DF1.drop('VALUE', 1)

Output:

   MONTH               values
2021-01-01              12.5 --> 50/4
2021-01-01              18.75 --> 75 (PREVIOUS VALUE) / 4 (EXCHANGE VALUE)
2021-01-01              25
2021-03-01              75
2021-04-01              10                        
2021-03-01              75
2021-03-01              75   

Edit:

For the solution asked in the comments to this answer. I would suggest creating a new empty EXCHANGE column in DF1.

DF1['EXCHANGE'] = ""


Output:
            MONTH      VALUE    EXCHANGE
            2021-01-01    50       
            2021-01-01    75       
            2021-01-01    100      
            2021-03-01    150      
            2021-04-01    100      
            2021-03-01    150      
            2021-03-01    150      

And then use the loc method to add values to the existing column-based on the desired date-time condition.

DF1.loc[df1['MONTH'] == '2021-01-01', 'EXCHANGE'] = 4
DF1.loc[df1['MONTH'] == '2021-04-01', 'EXCHANGE'] = 10

Output:
        MONTH      VALUE    EXCHANGE
        2021-01-01    50       4
        2021-01-01    75       4
        2021-01-01    100      4
        2021-03-01    150      ?
        2021-04-01    100      10
        2021-03-01    150      ?
        2021-03-01    150      ?

Once you have done that, you can apply the above procedure, just using the two DF1 columns this time. Note: the '?' in the DF1 EXCHANGE column is there since I am not sure what value you want for the third month in the month column.

CodePudding user response:

Avoid going into loops and if-else. You can use "Merge" to combine the two DataFrames based on column values and divide as usual:

DF1 = DF1.merge(DF2, on=['MONTH'])

Next divide VALUE by EXCHANGE:

DF1['VALUE'] = DF1['VALUE'].div(DF1['EXCHANGE'])

Finally drop EXCHANGE column from DF1:

DF1.drop(['EXCHANGE'], axis=1, inplace=True)
print(DF1)

Final Output:

MONTH  VALUE
1/1/2021  12.50
1/1/2021  18.75
1/1/2021  25.00
3/1/2021  75.00
3/1/2021  75.00
3/1/2021  75.00
4/1/2021  10.00

CodePudding user response:

If you don't care about retaining the original order of DF1 then

result = DF1.set_index('MONTH')['VALUE'].div(DF2.set_index('MONTH')['EXCHANGE']).dropna().to_frame('VALUE')

            VALUE
MONTH            
2021-01-01  12.50
2021-01-01  18.75
2021-01-01  25.00
2021-03-01  75.00
2021-03-01  75.00
2021-03-01  75.00
2021-04-01  10.00

Otherwise, saad_saeed's answer should work fine, optionally dropping the EXCHANGE column.

result = DF1.merge(DF2, how='left', on='MONTH')
result['VALUE'] = result['VALUE'].div(result['EXCHANGE'])

        MONTH  VALUE  EXCHANGE
0  2021-01-01  12.50         4
1  2021-01-01  18.75         4
2  2021-01-01  25.00         4
3  2021-03-01  75.00         2
4  2021-04-01  10.00        10
5  2021-03-01  75.00         2
6  2021-03-01  75.00         2

...or this one liner which also retains the original order

DF1['VALUE'] = DF1.merge(DF2, how='left', on='MONTH').apply(lambda x: x['VALUE'] / x['EXCHANGE'], axis=1)

        MONTH  VALUE
0  2021-01-01  12.50
1  2021-01-01  18.75
2  2021-01-01  25.00
3  2021-03-01  75.00
4  2021-04-01  10.00
5  2021-03-01  75.00
6  2021-03-01  75.00
  • Related