Home > Mobile >  Lag not working correctly on an R dataframe
Lag not working correctly on an R dataframe

Time:10-27

I'm using Base R 4.2.1 and have a strange problem with lagging a column My dataframe has two columns, one for time and the other for the yield of a bond

df_bond <- data. Frame(t  = seq(0, 10, 1),
                      y_t = seq(.1, 0, -.01)
                     )
> df_bond
    t  y_t
1   0 0.10
2   1 0.09
3   2 0.08
4   3 0.07
5   4 0.06
6   5 0.05
7   6 0.04
8   7 0.03
9   8 0.02
10  9 0.01
11 10 0.00

I then add a column for price and it works like a charm:

> df_bond$p_t <- 100 / (1   df_bond$y_t)^(10 - df_bond$t)
> df_bond
    t  y_t       p_t
1   0 0.10  38.55433
2   1 0.09  46.04278
3   2 0.08  54.02689
4   3 0.07  62.27497
5   4 0.06  70.49605
6   5 0.05  78.35262
7   6 0.04  85.48042
8   7 0.03  91.51417
9   8 0.02  96.11688
10  9 0.01  99.00990
11 10 0.00 100.00000

But if I now add a column for return, I get something weird

> df_bond$r_prior_yr <- df_bond$p_t / lag(df_bond$p_t, 1)
> df_bond
    t  y_t       p_t r_prior_yr
1   0 0.10  38.55433          1
2   1 0.09  46.04278          1
3   2 0.08  54.02689          1
4   3 0.07  62.27497          1
5   4 0.06  70.49605          1
6   5 0.05  78.35262          1
7   6 0.04  85.48042          1
8   7 0.03  91.51417          1
9   8 0.02  96.11688          1
10  9 0.01  99.00990          1
11 10 0.00 100.00000          1

Doing the division without the assignment shows that it is picking up a time series attribute, but I can't for the life of me explain why the ratio is always 1.

> df_bond$p_t / lag(df_bond$p_t, 1)
 [1] 1 1 1 1 1 1 1 1 1 1 1
attr(,"tsp")
[1]  0 10  1

Any assistance is greatly appreciated.

Sincerely and with many thanks in advance

Thomas Philips

CodePudding user response:

Try using dplyr::lag

df_bond <- data.frame(t  = seq(0, 10, 1),
                       y_t = seq(.1, 0, -.01)
)
df_bond$p_t <- 100 / (1   df_bond$y_t)^(10 - df_bond$t)
df_bond$r_prior_yr <- df_bond$p_t / dplyr::lag(df_bond$p_t, 1)
df_bond
#>     t  y_t       p_t r_prior_yr
#> 1   0 0.10  38.55433         NA
#> 2   1 0.09  46.04278   1.194231
#> 3   2 0.08  54.02689   1.173406
#> 4   3 0.07  62.27497   1.152666
#> 5   4 0.06  70.49605   1.132013
#> 6   5 0.05  78.35262   1.111447
#> 7   6 0.04  85.48042   1.090971
#> 8   7 0.03  91.51417   1.070586
#> 9   8 0.02  96.11688   1.050295
#> 10  9 0.01  99.00990   1.030099
#> 11 10 0.00 100.00000   1.010000

CodePudding user response:

Using the lag function from dplyr should get the result you're looking for. By default I suspect this is stats::lag causing your issue.

   df_bond <- data.frame(t  = seq(0, 10, 1),
                           y_t = seq(.1, 0, -.01))
    df_bond$p_t <- 100 / (1   df_bond$y_t)^(10 - df_bond$t)
    df_bond$r_prior_yr <- df_bond$p_t / dplyr::lag(df_bond$p_t, 1)
  • Related