Home > Back-end >  Detrending by date ranges
Detrending by date ranges

Time:08-29

Considering a df structured like this

Time             X

01-01-18         1  
01-02-18        20  
01-03-18        34  
01-04-18        67  
01-01-18        89  
01-02-18        45 
01-03-18        22 
01-04-18         1 
01-01-19        11 
01-02-19         6 
01-03-19        78 
01-04-19         5
01-01-20        23 
01-02-20         6 
01-03-20         9 
01-04-20        56 
01-01-21        78 
01-02-21        33 
01-03-21         2 
01-04-21        67 

I want to de-trend the times series from February to April for each year and append it to a new column Y

So far I thought something like this

from datetime import date, timedelta
import pandas as pd

df = pd.read_csv(...)

df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
df['Y'] = np.nan

def daterange(start_date, end_date):
    for n in range(int((end_date - start_date).days)):
        yield start_date   timedelta(n)

start_date = df.date(2018, 1, 2)
end_date = df.date(2018, 1, 4)
for date in daterange(start_date, end_date):
     df['Y'] = signal.detrend(df['X'])

My concern is that it would iterate over single observations and not over the trend of the selected period. Any way to fix it?

Another issue is how to iterate it over all the years without changing start/end dates each time

CodePudding user response:

When converting strings to the datetime format, you can specify the format directly. infer_datetime_format can mix up day and month.

df['date'] = pd.to_datetime(df['Time'], format='%d-%m-%y')

from scipy.signal import detrend

IIUC, here are two ways to achieve what you want:

1.

    I would prefer this way - using .apply():

def f(df):
    result = df['X'].copy()
    months = df['date'].dt.month
    mask = (months >= 2) & (months <= 4)
    result[mask] = detrend(result[mask])
    return result

df['new'] = df.groupby(df['date'].dt.year, group_keys=False).apply(f)

2.

    Another way - using .transform():

ser = df['X'].copy()
ser.index = df['date']

def f(s):
    result = s.copy()
    months = s.index.month
    mask = (months >= 2) & (months <= 4)
    result[mask] = detrend(result[mask])
    return result

new = ser.groupby(ser.index.year).transform(f)
new.index = df.index
df['new'] = new

Result:

         date   X        new
0  2018-01-01   1   1.000000
1  2018-02-01  20 -22.428571
2  2018-03-01  34  -4.057143
3  2018-04-01  67  33.314286
4  2018-01-01  89  89.000000
5  2018-02-01  45  15.685714
6  2018-03-01  22  -2.942857
7  2018-04-01   1 -19.571429
8  2019-01-01  11  11.000000
9  2019-02-01   6 -24.166667
10 2019-03-01  78  48.333333
11 2019-04-01   5 -24.166667
12 2020-01-01  23  23.000000
13 2020-02-01   6   7.333333
14 2020-03-01   9 -14.666667
15 2020-04-01  56   7.333333
16 2021-01-01  78  78.000000
17 2021-02-01  33  16.000000
18 2021-03-01   2 -32.000000
19 2021-04-01  67  16.000000
  • Related