I'm in the process of learning Python, and I'm trying to make a simple loop, for adding dirty prices, to my dataframe bond_df.
Days_left is a Series, bond_df is a pandas dataframe containing the closing prices used in the formula below.
If i run the command:
days = days_left[1].days
I get an integer of size 1 with the value of 2, and this is exactly what I need. I need the value of days as integers, and without any other time-stamp on it (see the attached picture). So, I use ".days", so that I can extract the integer value of the days, and get rid of the time-stamp for hours and seconds etc.
Because of this, i figured I could then use this in a loop to construct my column of dirty-prices, in my df:
for i, number in days_left:
days = days_left[i].days
bond_df['dirty_price'][i] = bond_df['closing_price'][i] ((365 - days)/365)
However this does not work and returns the message:
"TypeError: cannot unpack non-iterable Timedelta object"
I then figured, that I could construct a loop using a range instead:
for i in range(0, len(days_left)):
days = days_left[i].days
bond_df['dirty_price'][i] = bond_df['closing_price'][i] ((365 - days)/365)
print(days, bond_df['dirty_price'])
This seem to work as intended.
But I would still like to find out, what I did wrong in the first instance.
Can somebody explain the difference between these two loops and why I cannot do as above?
All the best, Nic
CodePudding user response:
you can simplify by using pandas vectorized functionality:
import pandas as pd
DAYS_IN_YEAR = 365 # this actually isn't constant; adjust as needed
df = pd.DataFrame(
{
"days_left": [pd.Timedelta(days=1), pd.Timedelta(days=2), pd.Timedelta(days=3)],
"closing_price": [1, 2, 3],
}
)
df["dirty_price"] = df["closing_price"] (
(DAYS_IN_YEAR - df["days_left"].dt.total_seconds() / 86400)
/ DAYS_IN_YEAR
# could also use df["days_left"].dt.days here if hours minutes etc. don't matter
)
df
days_left closing_price dirty_price
0 1 days 1 1.997260
1 2 days 2 2.994521
2 3 days 3 3.991781
CodePudding user response:
Here:
for i, number in days_left:
you meant
for i, number in enumerate(days_left):
most likely...
CodePudding user response:
What you have is likely something like this:
In [20]: dt = pd.timedelta_range("1 day", "5 day", freq="1d")
In [21]: type(dt)
Out[21]: pandas.core.indexes.timedeltas.TimedeltaIndex
In Python, iteration gives you the values:
In [22]: [i for i in dt]
Out[22]:
[Timedelta('1 days 00:00:00'),
Timedelta('2 days 00:00:00'),
Timedelta('3 days 00:00:00'),
Timedelta('4 days 00:00:00'),
Timedelta('5 days 00:00:00')]
To also access the index during iteration, you need to use enumerate.
In [23]: [(i, day) for i, day in enumerate(dt)]
Out[23]:
[(0, Timedelta('1 days 00:00:00')),
(1, Timedelta('2 days 00:00:00')),
(2, Timedelta('3 days 00:00:00')),
(3, Timedelta('4 days 00:00:00')),
(4, Timedelta('5 days 00:00:00'))]
That said, you should not be iterating. Use vector operations; something like this:
bond_df['dirty_price'] = bond_df['closing_price'] ((365 - days_left)/365)