Home > database >  Error in quantmod::Lag when adding columns to a dataframe
Error in quantmod::Lag when adding columns to a dataframe

Time:04-30

I have the following dataframe df:

tickers <- c('AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL')
returns <- c(0.1, 0.2, 0.3, -0.15, -0.25, .09, 0.4, -0.2)

df <- data.frame(tickers, returns)

df
  tickers returns
1    AAPL    0.10
2    AAPL    0.20
3    AAPL    0.30
4    AAPL   -0.15
5    AAPL   -0.25
6    AAPL    0.09
7    AAPL    0.40
8    AAPL   -0.20

I would like to add a column with the lagged returns. To do so, I use:

df$lag_1 <- Lag(df$returns , k=1)

Which produces:

  tickers returns Lag.1
1    AAPL    0.10    NA
2    AAPL    0.20  0.10
3    AAPL    0.30  0.20
4    AAPL   -0.15  0.30
5    AAPL   -0.25 -0.15
6    AAPL    0.09 -0.25
7    AAPL    0.40  0.09
8    AAPL   -0.20  0.40

So far, so good. But, when I try to use a variable to define the 2-day lag, I get an error message:

lookup <- 'returns'

df$lag_2 <- Lag(paste('df$', lookup) , k=2)

Error in Lag.default(paste("df$", lookup), k = 2) : 
  x must be a time series or numeric vector

CodePudding user response:

Use [[ instead of $

library(quantmod)
df$lag_2 <- Lag(df[[lookup]], k = 2)[,1]

-output

> df
  tickers returns lag_2
1    AAPL    0.10    NA
2    AAPL    0.20    NA
3    AAPL    0.30  0.10
4    AAPL   -0.15  0.20
5    AAPL   -0.25  0.30
6    AAPL    0.09 -0.15
7    AAPL    0.40 -0.25
8    AAPL   -0.20  0.09

CodePudding user response:

The stats::lag function is designed for application to time series objects. It is not designed to "lag" ordinary vectors. The lagging of a time series object is accomplished by altering its time base. The quantmod package's help page for its Lag function describes the differences succinctly:

This function differs from lag by returning the original series modified, as opposed to simply changing the time series properties. It differs from the like named Lag in the Hmisc as it deals primarily with time-series like objects.

It is important to realize that if there is no applicable method for Lag, the value returned will be from lag in base. That is, coerced to 'ts' if necessary, and subsequently shifted.

Neither the question, nor the current answer have included the needed code to load the quantmod package:

 library(quantmod)

The other learning opportunity is that the expression paste('df$', lookup) will never be effective. That attempt probably comes from experience with what are called "macro" languages". R does not parse and interpret constructed strings like that. The unquoted strings typed at the console are handled differently than strings built with paste or paste0. As @akrun demonstrated, it is possible to use the extraction and assignment operators, [[ and [[<-, with string valued values.

And a third learning opportunity comes from noticing that the name that appears at the top of your new column was not the same on that you assigned to it. What happened is that the result from quantmod::Lag was a matrix named "Lag.1" rather than a vector. The quantmod package is designed to work with zoo-like objects which are matrices rather than dataframes. Noter further that trying to access that clumn with the name that appears in the print-representation will not succeed:

> str(df)
'data.frame':   8 obs. of  3 variables:
 $ tickers: chr  "AAPL" "AAPL" "AAPL" "AAPL" ...
 $ returns: num  0.1 0.2 0.3 -0.15 -0.25 0.09 0.4 -0.2
 $ lag_1  : num [1:8, 1] NA 0.1 0.2 0.3 -0.15 -0.25 0.09 0.4
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "Lag.1"
> df$Lag.1         # FAIL
NULL
> df$lag_1         # Success
     Lag.1
[1,]    NA
[2,]  0.10
[3,]  0.20
[4,]  0.30
[5,] -0.15
[6,] -0.25
[7,]  0.09
[8,]  0.40

If you will be using "quantmod" or "tidyquant", you will definitely need to understand he differences in accessing values inside matrices versus accessing values in matrices.

  • Related