Home > Software design >  Pandas: Perform a Formula Calculation on Different Columns then Sum All of It
Pandas: Perform a Formula Calculation on Different Columns then Sum All of It

Time:03-16

I have set of data as follow

import pandas as pd 
dataset = [['01-01-2015', 234, '25-05-2017', 633, '03-06-2016', 935, '30-10-2019', 673, '16-12-2020', 825, '06-07-2019']]
ds = pd.DataFrame(dataset, columns = ['Start_Date', 'A', 'End_Date_A', 'B', 'End_Date_B', 'C', 'End_Date_C', 
                                      'D', 'End_Date_D', 'E', 'End_Date_E'])

Output:
    Start_Date  A   End_Date_A  B   End_Date_B  C   End_Date_C  D   End_Date_D  E   End_Date_E
0   01-01-2015  234 25-05-2017  633 03-06-2016  935 30-10-2019  673 16-12-2020  825 06-07-2019

Then, I want to perform a calculation whereby for every end date of letter need to be minus with the Start_Date, then divide with 365 to get duration in years. Then the duration of each letter need to be use as a 'power of' of the value of their letter, respectively. Then the result of each letter need to be sum up to get the total.

Is there a way to perform a loop of this calculation?

CodePudding user response:

You need to use a parser. I'm assuming the data looks as you have posted.

For this example, I'm using dateutil - but you can use any other in a similar way.

import pandas as pd
dataset = [['01-01-2015', 234, '25-05-2017', 633, '03-06-2016', 935, '30-10-2019', 673, '16-12-2020', 825, '06-07-2019']]
ds = pd.DataFrame(dataset, columns = ['Start_Date', 'A', 'End_Date_A', 'B', 'End_Date_B', 'C', 'End_Date_C',
                                      'D', 'End_Date_D', 'E', 'End_Date_E'])


from dateutil import parser
import math

letters = ["A", "B", "C", "D", "E"]
start_date = parser.parse(ds["Start_Date"][0])
total = 0

for letter in letters:
    end_date = parser.parse(ds["End_Date_"   letter][0])
    years = (end_date - start_date).days / 365
    power = math.pow(int(ds[letter][0]), years)
    total = power
    print(letter, ":",power)
    
print("total: ", total)

I just printed the result, but you can play with the concept to do what you actually need.

CodePudding user response:

With Pandas, I would organize the data as follows:

End_date Value
25-05-2017 234
03-06-2016 633
30-10-2019 935
16-12-2020 673
06-07-2019 825

You may want to set the letters 'A' etc. as index or an extra column, though not strictly required. There is no need to put the Start_Date in dataframe if it is constant.

Assume that all dates are date object. Now, subtract the end date by start date and divide by 365:

df['Duration'] = (df['End_date'] - Start_Date).dt.days / 365

Do the 'Power of' row-wise:

df['Value_powered'] = df['Value'] ** df['Duration']

Finally, sum:

result = df['Value_powered'].sum()
  • Related