Apply a weight to a dataframe by months back-CodePudding

Using pandas version:

1.3.2

I have a DataFrame that is indexed by Date and has a couple of columns like this:

                 XLY        UA
Date
2017-04-01  0.023991  0.060656
2017-05-01  0.010993 -0.081401
2017-06-01 -0.015596  0.130679
2017-07-01  0.019302 -0.101686
2017-08-01 -0.018608 -0.166207
2017-09-01  0.004684 -0.005298
2017-10-01  0.021203 -0.232357
2017-11-01  0.050658  0.034692
2017-12-01  0.021107  0.116513
2018-01-01  0.092411 -0.035285
2018-02-01 -0.034691  0.171206
...
2022-03-01  0.079468  0.039667

I have a python dictionary of weights where the tuple (1,12) means 1 to 12 months back, apply that weight, and so on.

weights = {(0,12): 6, (13,24) : 5, (25,36): 4, (37,48): 3, (49,60): 2, (61, 72): 1, (72,83): 1}

Is there a way to apply these weights to each row of the Dataframe so that for example, the row 2022-03-01 would be 0.079468 * 6 and .039667 * 6 and so on for all the rows that are 12 months from today, when it gets to 2021-03-01, it would apply 5 *, etc.

The index is Datetime, not string.

I know I can loop and do this. I am looking for a functional concise version.

**Edit 1**

 bins = pd.to_datetime([today]   [today - pd.DateOffset(months = k[1]) for k in weights])

give error:

Traceback (most recent call last):
  File "correlations.py", line 131, in <module>
    bins = pd.to_datetime([today]   [today - pd.DateOffset(k[1]) for k in weights])
 
 File "correlations.py", line 131, in <module>
bins = pd.to_datetime([today]   [today - pd.DateOffset(months = k[1]) for k in weights])


File "correlations.py", line 131, in <listcomp>
    bins = pd.to_datetime([today]   [today - pd.DateOffset(months = k[1]) for k in weights])
TypeError: 'int' object is not subscriptable
 idf:~/environments/python/yahoo_data$

CodePudding user response：

Try:

# Your dict
weights = {(0, 12): 6, (13, 24) : 5, (25, 36): 4, (37, 48): 3,
           (49,60): 2, (61, 72): 1, (72,83): 1}

# Today
today = pd.to_datetime('today').date()

# Create bins from today
bins= pd.to_datetime([today]   [today - pd.DateOffset(months=k[1]) for k in weights])

# bins must increase monotonically -> use [::-1] to reverse the list and labels
w = pd.cut(df.index, bins=bins[::-1], labels=list(weights.keys())[::-1]).map(weights)

# Answer from https://stackoverflow.com/a/71682982/15239951
cols = ['XLY', 'UA']
df[cols] = df[cols] = df[cols].mul(w, axis=0)

Output:

>>> df
                 XLY        UA
Date                          
2017-04-01  0.047982  0.121312
2017-05-01  0.021986 -0.162802
2017-06-01 -0.031192  0.261358
2017-07-01  0.038604 -0.203372
2017-08-01 -0.037216 -0.332414
2017-09-01  0.009368 -0.010596
2017-10-01  0.042406 -0.464714
2017-11-01  0.101316  0.069384
2017-12-01  0.042214  0.233026
2018-01-01  0.184822 -0.070570
2018-02-01 -0.069382  0.342412
2022-03-01  0.476808  0.238002

>>> bins
DatetimeIndex(['2022-03-30', '2021-03-30', '2020-03-30', '2019-03-30',
               '2018-03-30', '2017-03-30', '2016-03-30', '2015-04-30'],
              dtype='datetime64[ns]', freq=None)

>>> w
Int64Index([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 6], dtype='int64')

CodePudding user response：

This still uses a function which checks all the keys for determining the right weight.

Only full months are considered using (now - row.name).days//30 where row.name is the index date. Change that if a month definition is different.

def get_weight(month):
    weights = {(0,12): 6, (13,24) : 5, (25,36): 4, (37,48): 3, (49,60): 2, (61, 72): 1, (72,83): 1}
    for m, w in weights.items():
        if month in range(m[0], m[1] 1):
            return w

now = datetime.datetime.now()
df.apply(lambda row: row * get_weight((now - row.name).days//30), axis=1)

Or it is possible to create a map for unique month differences using the initial weights(to avoid repeating the calculation):


all_monthes = (now - df.index).days//30

unique_months = all_months.unique()
weight_map = {}
for month in unique_months:
    weight_map[month] = get_weight(month)

df[['XLY', 'UA']] = df[['XLY', 'UA']].mul(all_months.map(weight_map), axis=0)

>>> df

              XLY          UA
Date        
2017-04-01  0.047982    0.121312
2017-05-01  0.021986    -0.162802
2017-06-01  -0.031192   0.261358
2017-07-01  0.038604    -0.203372
2017-08-01  -0.037216   -0.332414
2017-09-01  0.009368    -0.010596
2017-10-01  0.042406    -0.464714
2017-11-01  0.101316    0.069384
2017-12-01  0.042214    0.233026
2018-01-01  0.184822    -0.070570
2018-02-01  -0.069382   0.342412
2022-03-01  0.476808    0.238002

>>> weight_map
{60: 2, 59: 2, 58: 2, 57: 2, 56: 2, 55: 2, 54: 2, 53: 2, 52: 2, 51: 2, 50: 2, 1: 6}

>>> all_months
[60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 1]

>>> output

              XLY          UA
Date        
2017-04-01  0.095964    0.242624
2017-05-01  0.043972    -0.325604
2017-06-01  -0.062384   0.522716
2017-07-01  0.077208    -0.406744
2017-08-01  -0.074432   -0.664828
2017-09-01  0.018736    -0.021192
2017-10-01  0.084812    -0.929428
2017-11-01  0.202632    0.138768
2017-12-01  0.084428    0.466052
2018-01-01  0.369644    -0.141140
2018-02-01  -0.138764   0.684824
2022-03-01  2.860848    1.428012