Home > database >  Find most recent date in multiple data.frame objects of unequal sizes in R
Find most recent date in multiple data.frame objects of unequal sizes in R

Time:09-30

I have multiple data.frame objects of unequal lengths. I would like to find the most recent date in all of them and store the data somewhere.

Here is an example of hopefully reproducible code to illustrate what I would like (with comments and sources). This gives 7 data.frame objects of variable lengths:

library(quantmod)

# Load ticker data from 2020-01-01 till 2021-02-02
tickers <- c("NKLA", "MPNGF", "RMO", "JD", "COIN")
getSymbols.yahoo(tickers, auto.assign = TRUE, env = globalenv(), from = "2020-01-01", to = "2021-02-02")

# Load ticker data from 2020-01-01 till yesterday (if not weekend or holiday)
tickers2 <- c("IBM", "AAPL", "MRNA")
getSymbols.yahoo(tickers2, auto.assign = TRUE, env = globalenv(), from = "2020-01-01")

# Close all Internet connections as a precaution
# https://stackoverflow.com/a/52758758/2950721
closeAllConnections()

# Find xts objects
xtsObjects <- names(which(unlist(eapply(.GlobalEnv, is.xts))))

# Convert xts to data.frame
# https://stackoverflow.com/a/69246047/2950721
for (i in seq_along(xtsObjects)) {
  assign(xtsObjects[i], fortify.zoo(get(xtsObjects[i])))
}

# 1st column name from Index to Date
# https://stackoverflow.com/a/69292036/2950721
for (i in seq_along(xtsObjects)) {
  tmp <- get(xtsObjects[i])
  colnames(tmp)[colnames(tmp) == "Index"] <- "Date"
  assign(xtsObjects[i], tmp)
}
remove(tmp)

Individually retreive the dates is pretty straightforward:

max(AAPL$Date)
max(IBM$Date)
max(JD$Date)
max(MPNGF$Date)
max(MRNA$Date)
max(NKLA$Date)
max(RMO$Date)

But when I try the following codes none of them would render or, better yet, store the most recent dates with corresponding origine (i.e., ticker):

dataframeObjects <- names(which(unlist(eapply(.GlobalEnv, is.data.frame))))

# Tentative 1    
for (i in seq_along(dataframeObjects)) {
  mostRecentDates <- max(dataframeObjects[i]$Date)
}

# Tentative 2
for (i in 1:length(dataframeObjects)) {
  mostRecentDates <- max(dataframeObjects[i]["Date"])
}

Both tentatives give a [1]NA when invoking variable mostRecentDates.

My question:

  • What code is needed in order to store the most recent dates of all data.frame objects?

Thanks in advance.


Systems used:

  • R version: 4.1.1 (2021-08-10)
  • RStudio version: 1.4.1717
  • OS: macOS Catalina version 10.15.7 and macOS Big Sur version 11.6

CodePudding user response:

I recommend to store the xts objects in another environment than the global one, that makes it much easier to handle them. We can turn that environment into a list and then we can iterate over that list with purrr::map() or base::lapply().

Here is what that can look like for your example.

library(quantmod)
library(tidyverse)
sym_env <- new.env()

tickers <- c("NKLA", "MPNGF", "RMO", "JD", "COIN")
getSymbols.yahoo(tickers, auto.assign = TRUE, env = sym_env, from = "2020-01-01", to = "2021-02-02")

tickers2 <- c("IBM", "AAPL", "MRNA")
getSymbols.yahoo(tickers2, auto.assign = TRUE, env = sym_env, from = "2020-01-01")

closeAllConnections()

as.list(sym_env) |> 
  map(fortify.zoo) |> 
  map(\(x) rename(x, Date=Index)) |> 
  map(\(x) max(x$Date))

Returns:

$RMO
[1] "2021-02-01"

$NKLA
[1] "2021-02-01"

$JD
[1] "2021-02-01"

$AAPL
[1] "2021-09-28"

$IBM
[1] "2021-09-28"

$MRNA
[1] "2021-09-28"

$MPNGF
[1] "2021-02-01"

CodePudding user response:

We may get the objects from intersect of object names in ls and the ticker objects, use mget to get the value of objects in a list, loop over the list with lapply, extract the 'Date' column and get the max

do.call(c, lapply(mget(intersect(c(tickers, tickers2), ls())), 
       function(x) max(x$Date)))

-output

   NKLA        MPNGF          RMO           JD          IBM         AAPL         MRNA 
"2021-02-01" "2021-02-01" "2021-02-01" "2021-02-01" "2021-09-28" "2021-09-28" "2021-09-28" 

In the OP's code, the dataframeObjects are just names of objects. We need get in the loop to return the value

# // in case there are other data.frame objects as well, get the intersect
nm1 <- intersect(dataframeObjects, c(tickers, tickers2))
# // create a `list` to store the output
out <- vector('list', length(nm1))
names(out) <- nm1
for(i in seq_along(nm1)) {
   out[[i]] <- max(get(nm1[i])$Date)
}

-output

> out
$RMO
[1] "2021-02-01"

$NKLA
[1] "2021-02-01"

$JD
[1] "2021-02-01"

$AAPL
[1] "2021-09-28"

$IBM
[1] "2021-09-28"

$MRNA
[1] "2021-09-28"

$MPNGF
[1] "2021-02-01"
  • Related