I have multiple data.frame
objects of unequal lengths. I would like to find the most recent date in all of them and store the data somewhere.
Here is an example of hopefully reproducible code to illustrate what I would like (with comments and sources). This gives 7 data.frame
objects of variable lengths:
library(quantmod)
# Load ticker data from 2020-01-01 till 2021-02-02
tickers <- c("NKLA", "MPNGF", "RMO", "JD", "COIN")
getSymbols.yahoo(tickers, auto.assign = TRUE, env = globalenv(), from = "2020-01-01", to = "2021-02-02")
# Load ticker data from 2020-01-01 till yesterday (if not weekend or holiday)
tickers2 <- c("IBM", "AAPL", "MRNA")
getSymbols.yahoo(tickers2, auto.assign = TRUE, env = globalenv(), from = "2020-01-01")
# Close all Internet connections as a precaution
# https://stackoverflow.com/a/52758758/2950721
closeAllConnections()
# Find xts objects
xtsObjects <- names(which(unlist(eapply(.GlobalEnv, is.xts))))
# Convert xts to data.frame
# https://stackoverflow.com/a/69246047/2950721
for (i in seq_along(xtsObjects)) {
assign(xtsObjects[i], fortify.zoo(get(xtsObjects[i])))
}
# 1st column name from Index to Date
# https://stackoverflow.com/a/69292036/2950721
for (i in seq_along(xtsObjects)) {
tmp <- get(xtsObjects[i])
colnames(tmp)[colnames(tmp) == "Index"] <- "Date"
assign(xtsObjects[i], tmp)
}
remove(tmp)
Individually retreive the dates is pretty straightforward:
max(AAPL$Date)
max(IBM$Date)
max(JD$Date)
max(MPNGF$Date)
max(MRNA$Date)
max(NKLA$Date)
max(RMO$Date)
But when I try the following codes none of them would render or, better yet, store the most recent dates with corresponding origine (i.e., ticker):
dataframeObjects <- names(which(unlist(eapply(.GlobalEnv, is.data.frame))))
# Tentative 1
for (i in seq_along(dataframeObjects)) {
mostRecentDates <- max(dataframeObjects[i]$Date)
}
# Tentative 2
for (i in 1:length(dataframeObjects)) {
mostRecentDates <- max(dataframeObjects[i]["Date"])
}
Both tentatives give a [1]NA
when invoking variable mostRecentDates
.
My question:
- What code is needed in order to store the most recent dates of all
data.frame
objects?
Thanks in advance.
Systems used:
- R version: 4.1.1 (2021-08-10)
- RStudio version: 1.4.1717
- OS: macOS Catalina version 10.15.7 and macOS Big Sur version 11.6
CodePudding user response:
I recommend to store the xts
objects in another environment
than the global one, that makes it much easier to handle them. We can turn that environment into a list and then we can iterate over that list with purrr::map()
or base::lapply()
.
Here is what that can look like for your example.
library(quantmod)
library(tidyverse)
sym_env <- new.env()
tickers <- c("NKLA", "MPNGF", "RMO", "JD", "COIN")
getSymbols.yahoo(tickers, auto.assign = TRUE, env = sym_env, from = "2020-01-01", to = "2021-02-02")
tickers2 <- c("IBM", "AAPL", "MRNA")
getSymbols.yahoo(tickers2, auto.assign = TRUE, env = sym_env, from = "2020-01-01")
closeAllConnections()
as.list(sym_env) |>
map(fortify.zoo) |>
map(\(x) rename(x, Date=Index)) |>
map(\(x) max(x$Date))
Returns:
$RMO
[1] "2021-02-01"
$NKLA
[1] "2021-02-01"
$JD
[1] "2021-02-01"
$AAPL
[1] "2021-09-28"
$IBM
[1] "2021-09-28"
$MRNA
[1] "2021-09-28"
$MPNGF
[1] "2021-02-01"
CodePudding user response:
We may get the objects from intersect
of object names in ls
and the ticker
objects, use mget
to get the value of objects in a list
, loop over the list
with lapply
, extract the 'Date' column and get the max
do.call(c, lapply(mget(intersect(c(tickers, tickers2), ls())),
function(x) max(x$Date)))
-output
NKLA MPNGF RMO JD IBM AAPL MRNA
"2021-02-01" "2021-02-01" "2021-02-01" "2021-02-01" "2021-09-28" "2021-09-28" "2021-09-28"
In the OP's code, the dataframeObjects
are just names of objects. We need get
in the loop to return the value
# // in case there are other data.frame objects as well, get the intersect
nm1 <- intersect(dataframeObjects, c(tickers, tickers2))
# // create a `list` to store the output
out <- vector('list', length(nm1))
names(out) <- nm1
for(i in seq_along(nm1)) {
out[[i]] <- max(get(nm1[i])$Date)
}
-output
> out
$RMO
[1] "2021-02-01"
$NKLA
[1] "2021-02-01"
$JD
[1] "2021-02-01"
$AAPL
[1] "2021-09-28"
$IBM
[1] "2021-09-28"
$MRNA
[1] "2021-09-28"
$MPNGF
[1] "2021-02-01"