this is my first question here, so go easy on me.
I've computed a certain portfolio in python, for which I've gotten a dataframe (or list for that matter) of around 3000 daily returns in the format of
2006-12-19 -0.005811
2006-12-20 0.003131
2006-12-21 0.002231
2006-12-22 -0.005467
2006-12-27 0.008004
... ...
2022-03-25 -0.003178
2022-03-28 0.003613
2022-03-29 0.016891
2022-03-30 -0.007323
2022-03-31 -0.007277
My problem now is that I would like to statistically analyze the performance of it, so I need to start at various dates (to analyze certain periods) with 100 (percent) and then aggregate these returns to it. I tried for loops, but I got weird results, and was hoping to get a hint on here or an explanation to how to solve this.
My solution so far was this:
startvalue = 100
for row in TVWR:
startvalue = startvalue*(1 row)
with this, I get the result of 475.8556658351451, which seems a little high. Additionally, I would like to get a result as a Dataframe looking something like this:
2006-12-19 99.994
2006-12-20 100.012
2006-12-21 100.432
2006-12-22 100.024
2006-12-27 100.895
Thank you for your kind help!
CodePudding user response:
First things first, I know of a proverb that says "If you loop in pandas, you're doing something wrong". Most of the time it's true.
In your case, you can use apply
or transform
to modify the series (I'll be guessing names since those are omitted):
df["values"] = df["values"].apply(lambda x: START_VALUE*(1 x))
or
df["values"] = df["values"].transform(lambda x: START_VALUE*(1 x))
Here, df
is your DataFrame and values
is the name of the column containing the percentages (-0.005811, 0.003131 etc...). START_VALUE
is your startvalue
variable, I am creating it in all caps to signify it should be treated as a constant.
Your example fails because you keep modifying startvariable
:
for row in TVWR:
startvalue = startvalue*(1 row)
# this means that startvalue isn't 100 anymore !
CodePudding user response:
I was finally able to solve my issue:
THREE_DAILY_RETURNS is my DataFrame with three columns of daily returns
THREE_DAILY_RETURNS_CUM = (1 THREE_DAILY_RETURNS).cumprod() - 1
THREE_DAILY_RETURNS_CUM = THREE_DAILY_RETURNS_CUM 1
THREE_DAILY_RETURNS_CUM = THREE_DAILY_RETURNS_CUM * 100
THREE_DAILY_RETURNS_CUM
This gives me the desired result:
Total SMIC MeanReturn
Date
2006-12-19 99.418891 99.306143 99.314898
2006-12-20 99.730204 99.393379 99.765376
2006-12-21 99.952733 99.446009 100.088569
2006-12-22 99.406320 98.893548 99.534840
2006-12-27 100.201991 99.976018 100.337066
... ... ... ...
2022-03-25 383.051964 222.143133 379.964060
2022-03-28 384.435785 223.113019 381.076173
2022-03-29 390.929235 226.226521 388.694509
2022-03-30 388.066547 224.724212 385.177961
2022-03-31 385.242695 223.304389 382.313810
Probably not the most elegant solution, but it worked for me.