Home > Back-end >  creating lags for validation set
creating lags for validation set

Time:10-18

I am working with lags for a time series model. I want to automate the creation of lags, which I already did for the training set.

for i in range(1,n 1):
    column_name = 'lag_q{}'.format(i)

    df_train[column_name]=df_train.groupby(by = ['strain','sex','genotype'],
                          dropna= False)['quantity'].shift(i)

However, for the validation set, I only want the first values to be in terms of the actual amount, and the rest to be using the prediction. Therefore, I need to fill the validation df and leave blank spaces that will later be filled with the forecasting.

These are the quantity values I have for the rows before the ones I want to fill.

quantity
26450
24707
25369
25193
27250

and this df would be the one I want back

lag_q1 lag_q2 lag_q3 lag_q4 lag_q5
27250 25193 25369 24707 26450
27250 25193 25369 24707
27250 25193 25369
27250 25193
27250

I was trying with some for loops but I only managed to fill the first row

for i in range(1,n 1):
    column_name = 'lag_q{}'.format(i)
    lags_cols.append(column_name)
    df_val[column_name] = ''
    df_val.loc[0,column_name] = df_train.iloc[-i]['quantity']

CodePudding user response:

You could use :

N = len(df)
q = df['quantity'].to_numpy()
a = np.arange(N)
out = pd.DataFrame(np.triu(q[np.triu(a[:, None]-a-1)]),
                   columns=[f'lag_q{i 1}' for i in range(N)])

output:

   lag_q1  lag_q2  lag_q3  lag_q4  lag_q5
0   27250   25193   25369   24707   26450
1       0   27250   25193   25369   24707
2       0       0   27250   25193   25369
3       0       0       0   27250   25193
4       0       0       0       0   27250
  • Related