I have a data frame df1
like this:
time | Diamond.Hands | returns | volume | close |
---|---|---|---|---|
2021-02-16 10:00:00 | 0.4583333 | 0.0056710775 | 10059 | 53.20 |
2021-02-16 11:00:00 | 0.2352941 | -0.0037586920 | 8664 | 53.01 |
2021-02-16 12:00:00 | 0.4400000 | -0.0037586920 | 10059 | 52.40 |
# Log return
prices <- df1$close
log_returns <- diff(log(prices), lag=1)
df1$logreturns <- log_returns
returns the error:
Fehler in `$<-.data.frame`(`*tmp*`, logreturns, value = c(0.000187952260679136, :
Ersetzung hat 2219 Zeilen, Daten haben 2220
Do you have any ideas how to fix that?
CodePudding user response:
When you do
y <- diff(x, lag = m, differences = k)
the resulting vector y
has m * k
fewer elements than x
. If you want to have both x
and y
as data.frame/matrix columns, you need to pad m * k
number of leading NAs to y
.
In your case, m = 1
and k = 1
, so you need to pad one NA:
df1$logreturns <- c(NA, log_returns)
More concisely, we can pack your 3 lines of code into 1:
df1$logreturns <- c(NA, diff(log(df1$close)))
Remark:
If you want to know how to do mutate()
diff()
in dplyr, then maybe something like:
df1 %>% mutate(logreturns = c(NA, diff(log(close))))
Here is another possibly related Q & A: Error when using "diff" function inside of dplyr mutate.