I have the following dataframe df
:
tickers <- c('AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL')
returns <- c(0.1, 0.2, 0.3, -0.15, -0.25, .09, 0.4, -0.2)
df <- data.frame(tickers, returns)
df
tickers returns
1 AAPL 0.10
2 AAPL 0.20
3 AAPL 0.30
4 AAPL -0.15
5 AAPL -0.25
6 AAPL 0.09
7 AAPL 0.40
8 AAPL -0.20
I would like to add a column with the lagged returns. To do so, I use:
df$lag_1 <- Lag(df$returns , k=1)
Which produces:
tickers returns Lag.1
1 AAPL 0.10 NA
2 AAPL 0.20 0.10
3 AAPL 0.30 0.20
4 AAPL -0.15 0.30
5 AAPL -0.25 -0.15
6 AAPL 0.09 -0.25
7 AAPL 0.40 0.09
8 AAPL -0.20 0.40
So far, so good. But, when I try to use a variable to define the 2-day lag, I get an error message:
lookup <- 'returns'
df$lag_2 <- Lag(paste('df$', lookup) , k=2)
Error in Lag.default(paste("df$", lookup), k = 2) :
x must be a time series or numeric vector
CodePudding user response:
Use [[
instead of $
library(quantmod)
df$lag_2 <- Lag(df[[lookup]], k = 2)[,1]
-output
> df
tickers returns lag_2
1 AAPL 0.10 NA
2 AAPL 0.20 NA
3 AAPL 0.30 0.10
4 AAPL -0.15 0.20
5 AAPL -0.25 0.30
6 AAPL 0.09 -0.15
7 AAPL 0.40 -0.25
8 AAPL -0.20 0.09
CodePudding user response:
The stats::lag
function is designed for application to time series objects. It is not designed to "lag" ordinary vectors. The lagging of a time series object is accomplished by altering its time base. The quantmod
package's help page for its Lag
function describes the differences succinctly:
This function differs from
lag
by returning the original series modified, as opposed to simply changing the time series properties. It differs from the like namedLag
in the Hmisc as it deals primarily with time-series like objects.
It is important to realize that if there is no applicable method for
Lag
, the value returned will be fromlag
in base. That is, coerced to 'ts' if necessary, and subsequently shifted.
Neither the question, nor the current answer have included the needed code to load the quantmod
package:
library(quantmod)
The other learning opportunity is that the expression paste('df$', lookup)
will never be effective. That attempt probably comes from experience with what are called "macro" languages". R does not parse and interpret constructed strings like that. The unquoted strings typed at the console are handled differently than strings built with paste
or paste0
. As @akrun demonstrated, it is possible to use the extraction and assignment operators, [[
and [[<-
, with string valued values.
And a third learning opportunity comes from noticing that the name that appears at the top of your new column was not the same on that you assigned to it. What happened is that the result from quantmod::Lag
was a matrix named "Lag.1" rather than a vector. The quantmod package is designed to work with zoo
-like objects which are matrices rather than dataframes. Noter further that trying to access that clumn with the name that appears in the print
-representation will not succeed:
> str(df)
'data.frame': 8 obs. of 3 variables:
$ tickers: chr "AAPL" "AAPL" "AAPL" "AAPL" ...
$ returns: num 0.1 0.2 0.3 -0.15 -0.25 0.09 0.4 -0.2
$ lag_1 : num [1:8, 1] NA 0.1 0.2 0.3 -0.15 -0.25 0.09 0.4
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "Lag.1"
> df$Lag.1 # FAIL
NULL
> df$lag_1 # Success
Lag.1
[1,] NA
[2,] 0.10
[3,] 0.20
[4,] 0.30
[5,] -0.15
[6,] -0.25
[7,] 0.09
[8,] 0.40
If you will be using "quantmod" or "tidyquant", you will definitely need to understand he differences in accessing values inside matrices versus accessing values in matrices.