Using pandas
version:
1.3.2
I have a DataFrame
that is indexed
by Date
and has a couple of columns
like this:
XLY UA
Date
2017-04-01 0.023991 0.060656
2017-05-01 0.010993 -0.081401
2017-06-01 -0.015596 0.130679
2017-07-01 0.019302 -0.101686
2017-08-01 -0.018608 -0.166207
2017-09-01 0.004684 -0.005298
2017-10-01 0.021203 -0.232357
2017-11-01 0.050658 0.034692
2017-12-01 0.021107 0.116513
2018-01-01 0.092411 -0.035285
2018-02-01 -0.034691 0.171206
...
2022-03-01 0.079468 0.039667
I have a python
dictionary
of weights where the tuple
(1,12) means 1 to 12 months back, apply that weight, and so on.
weights = {(0,12): 6, (13,24) : 5, (25,36): 4, (37,48): 3, (49,60): 2, (61, 72): 1, (72,83): 1}
Is there a way to apply
these weights to each row
of the Dataframe
so that for example, the row
2022-03-01
would be 0.079468 * 6
and .039667 * 6
and so on for all the rows that are 12 months from today, when it gets to 2021-03-01, it would apply 5 *
, etc.
The index
is Datetime
, not string
.
I know I can loop
and do this. I am looking for a functional
concise version.
**Edit 1**
bins = pd.to_datetime([today] [today - pd.DateOffset(months = k[1]) for k in weights])
give error:
Traceback (most recent call last):
File "correlations.py", line 131, in <module>
bins = pd.to_datetime([today] [today - pd.DateOffset(k[1]) for k in weights])
File "correlations.py", line 131, in <module>
bins = pd.to_datetime([today] [today - pd.DateOffset(months = k[1]) for k in weights])
File "correlations.py", line 131, in <listcomp>
bins = pd.to_datetime([today] [today - pd.DateOffset(months = k[1]) for k in weights])
TypeError: 'int' object is not subscriptable
idf:~/environments/python/yahoo_data$
CodePudding user response:
Try:
# Your dict
weights = {(0, 12): 6, (13, 24) : 5, (25, 36): 4, (37, 48): 3,
(49,60): 2, (61, 72): 1, (72,83): 1}
# Today
today = pd.to_datetime('today').date()
# Create bins from today
bins= pd.to_datetime([today] [today - pd.DateOffset(months=k[1]) for k in weights])
# bins must increase monotonically -> use [::-1] to reverse the list and labels
w = pd.cut(df.index, bins=bins[::-1], labels=list(weights.keys())[::-1]).map(weights)
# Answer from https://stackoverflow.com/a/71682982/15239951
cols = ['XLY', 'UA']
df[cols] = df[cols] = df[cols].mul(w, axis=0)
Output:
>>> df
XLY UA
Date
2017-04-01 0.047982 0.121312
2017-05-01 0.021986 -0.162802
2017-06-01 -0.031192 0.261358
2017-07-01 0.038604 -0.203372
2017-08-01 -0.037216 -0.332414
2017-09-01 0.009368 -0.010596
2017-10-01 0.042406 -0.464714
2017-11-01 0.101316 0.069384
2017-12-01 0.042214 0.233026
2018-01-01 0.184822 -0.070570
2018-02-01 -0.069382 0.342412
2022-03-01 0.476808 0.238002
>>> bins
DatetimeIndex(['2022-03-30', '2021-03-30', '2020-03-30', '2019-03-30',
'2018-03-30', '2017-03-30', '2016-03-30', '2015-04-30'],
dtype='datetime64[ns]', freq=None)
>>> w
Int64Index([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 6], dtype='int64')
CodePudding user response:
This still uses a function which checks all the keys for determining the right weight.
Only full months are considered using (now - row.name).days//30
where row.name
is the index date. Change that if a month definition is different.
def get_weight(month):
weights = {(0,12): 6, (13,24) : 5, (25,36): 4, (37,48): 3, (49,60): 2, (61, 72): 1, (72,83): 1}
for m, w in weights.items():
if month in range(m[0], m[1] 1):
return w
now = datetime.datetime.now()
df.apply(lambda row: row * get_weight((now - row.name).days//30), axis=1)
Or it is possible to create a map for unique month differences using the initial weights(to avoid repeating the calculation):
all_monthes = (now - df.index).days//30
unique_months = all_months.unique()
weight_map = {}
for month in unique_months:
weight_map[month] = get_weight(month)
df[['XLY', 'UA']] = df[['XLY', 'UA']].mul(all_months.map(weight_map), axis=0)
>>> df
XLY UA
Date
2017-04-01 0.047982 0.121312
2017-05-01 0.021986 -0.162802
2017-06-01 -0.031192 0.261358
2017-07-01 0.038604 -0.203372
2017-08-01 -0.037216 -0.332414
2017-09-01 0.009368 -0.010596
2017-10-01 0.042406 -0.464714
2017-11-01 0.101316 0.069384
2017-12-01 0.042214 0.233026
2018-01-01 0.184822 -0.070570
2018-02-01 -0.069382 0.342412
2022-03-01 0.476808 0.238002
>>> weight_map
{60: 2, 59: 2, 58: 2, 57: 2, 56: 2, 55: 2, 54: 2, 53: 2, 52: 2, 51: 2, 50: 2, 1: 6}
>>> all_months
[60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 1]
>>> output
XLY UA
Date
2017-04-01 0.095964 0.242624
2017-05-01 0.043972 -0.325604
2017-06-01 -0.062384 0.522716
2017-07-01 0.077208 -0.406744
2017-08-01 -0.074432 -0.664828
2017-09-01 0.018736 -0.021192
2017-10-01 0.084812 -0.929428
2017-11-01 0.202632 0.138768
2017-12-01 0.084428 0.466052
2018-01-01 0.369644 -0.141140
2018-02-01 -0.138764 0.684824
2022-03-01 2.860848 1.428012