I have a directory with a set of .rds files containing dataframes:
files <- c("file_2022-11-30.rds", "file_2022-12-01.rds")
I want to read each file into a list and then assign a new column to each dataframe in the list that contains a piece of name of the file it was loaded in from (the date). I know how to do this with a for loop, but I'm looking for a concise solution. I'm sure there's a way to do this with lapply, but this doesn't work:
library(dplyr)
df_list <- lapply(files, readRDS) %>%
lapply(FUN = function(x) mutate(date = as.Date(stringr::str_sub(files[x], start = -14, end = -5)))) %>%
bind_rows()
Desired output would look something like this:
var1 date
1 1 2022-11-30
2 2 2022-11-30
3 2 2022-11-30
4 1 2022-11-30
5 2 2022-11-30
6 2 2022-12-01
7 1 2022-12-01
8 2 2022-12-01
9 1 2022-12-01
10 2 2022-12-01
CodePudding user response:
We may use as.Date
on the files
and convert it to Date
class. Then loop over the files
, read with readRDS
, cbind
the 'dates' in Map
and rbind
the list
elements
dates <- as.Date(files, format = "file_%Y-%m-%d.rds")
do.call(rbind, Map(cbind, lapply(files, readRDS), dates = dates))
Or if we want to use tidyverse
library(purrr)
library(dplyr)
map2_dfr(files, dates, ~ readRDS(.x) %>%
mutate(dates = .y))
In the OP's code, the files[x]
wouldn't work because x
is not an index, it is the list
element i.e. the output from readRDS
and there is no information about the files
in the x
. Instead, we can do this once within the single lapply
lapply(files, function(x)
readRDS(x) %>%
mutate(date = as.Date(stringr::str_sub(x, start = -14, end = -5)))) %>%
bind_rows