See the following datasets:
Date | ticker | overnight_return |
---|---|---|
2017-07-20 | CLXT | 0.019556 |
2017-07-21 | CLXT | 0.039778 |
2022-02-14 | ETNB | -0.006186 |
2022-02-15 | ETNB | 0.024590 |
I am currently testing a hypothesis based on the overnight return factor. I would like to first apply ranking for all the unique values in ticker
column on every Date
. And then z-scoreing for the ranks. Lastly, I would like to rank them in terms of deciles.
I used the following code to get the final z-score for one date:
import scipy.stats as stats
stats.zscore(equity_daily[equity_daily.Date == "2017-07-20"].overnight_return.rank().dropna().values)
Now I would like to get the z-score for every day based on the rank of all tickers on that particular day.
My apporach was to get the pivoted table then create a new table containing the z-scores.
equity_daily.pivot(columns = "ticker", values = "overnight_return", index = "Date")
But the following error occured:
ValueError: Index contains duplicate entries, cannot reshape
Desired outcome:
Date | ticker | overnight_return | Decile_rank |
---|---|---|---|
2017-07-20 | CLXT | 0.019556 | 0 |
2017-07-21 | CLXT | 0.039778 | 2 |
2022-02-14 | ETNB | -0.006186 | 9 |
2022-02-15 | ETNB | 0.024590 | 8 |
CodePudding user response:
Not having more of a data sample, hard to test myself, but...
Try pivot_table()
instead of pivot()
. pivot
doesn't do aggregations
CodePudding user response:
I have set up multiple helper columns.
def z_score(x):
"""Helper function for Normalization"""
return stats.zscore(x)
def decile_rank(x):
"""Helper function for decile ranking"""
return pd.qcut(x, 10, labels = False)
equity_daily["overnight_rank"] = equity_daily.groupby("Date")["overnight_return"].rank()
equity_daily["overnight_normalized"] = equity_daily.groupby("Date")["overnight_rank"].apply(z_score)
equity_daily["decile_rank"] = equity_daily.gruopby("Date")["overnight_normalizaed"].apply(decile_rank)
Howeve, the folloiwng error occured:
ValueError: Input array must be 1 dimensional