I have set of data as follow
import pandas as pd
dataset = [['01-01-2015', 234, '25-05-2017', 633, '03-06-2016', 935, '30-10-2019', 673, '16-12-2020', 825, '06-07-2019']]
ds = pd.DataFrame(dataset, columns = ['Start_Date', 'A', 'End_Date_A', 'B', 'End_Date_B', 'C', 'End_Date_C',
'D', 'End_Date_D', 'E', 'End_Date_E'])
Output:
Start_Date A End_Date_A B End_Date_B C End_Date_C D End_Date_D E End_Date_E
0 01-01-2015 234 25-05-2017 633 03-06-2016 935 30-10-2019 673 16-12-2020 825 06-07-2019
Then, I want to perform a calculation whereby for every end date of letter need to be minus with the Start_Date
, then divide with 365 to get duration in years. Then the duration of each letter need to be use as a 'power of' of the value of their letter, respectively. Then the result of each letter need to be sum up to get the total.
Is there a way to perform a loop of this calculation?
CodePudding user response:
You need to use a parser. I'm assuming the data looks as you have posted.
For this example, I'm using dateutil
- but you can use any other in a similar way.
import pandas as pd
dataset = [['01-01-2015', 234, '25-05-2017', 633, '03-06-2016', 935, '30-10-2019', 673, '16-12-2020', 825, '06-07-2019']]
ds = pd.DataFrame(dataset, columns = ['Start_Date', 'A', 'End_Date_A', 'B', 'End_Date_B', 'C', 'End_Date_C',
'D', 'End_Date_D', 'E', 'End_Date_E'])
from dateutil import parser
import math
letters = ["A", "B", "C", "D", "E"]
start_date = parser.parse(ds["Start_Date"][0])
total = 0
for letter in letters:
end_date = parser.parse(ds["End_Date_" letter][0])
years = (end_date - start_date).days / 365
power = math.pow(int(ds[letter][0]), years)
total = power
print(letter, ":",power)
print("total: ", total)
I just printed the result, but you can play with the concept to do what you actually need.
CodePudding user response:
With Pandas, I would organize the data as follows:
End_date | Value |
---|---|
25-05-2017 | 234 |
03-06-2016 | 633 |
30-10-2019 | 935 |
16-12-2020 | 673 |
06-07-2019 | 825 |
You may want to set the letters 'A' etc. as index or an extra column, though not strictly required. There is no need to put the Start_Date
in dataframe if it is constant.
Assume that all dates are date
object. Now, subtract the end date by start date and divide by 365:
df['Duration'] = (df['End_date'] - Start_Date).dt.days / 365
Do the 'Power of' row-wise:
df['Value_powered'] = df['Value'] ** df['Duration']
Finally, sum:
result = df['Value_powered'].sum()