Home > OS >  Using substring of file name to create new variable in a list of dataframes
Using substring of file name to create new variable in a list of dataframes

Time:12-07

I have a directory with a set of .rds files containing dataframes:

files <- c("file_2022-11-30.rds", "file_2022-12-01.rds")

I want to read each file into a list and then assign a new column to each dataframe in the list that contains a piece of name of the file it was loaded in from (the date). I know how to do this with a for loop, but I'm looking for a concise solution. I'm sure there's a way to do this with lapply, but this doesn't work:

library(dplyr)

df_list <- lapply(files, readRDS) %>%
  lapply(FUN = function(x) mutate(date = as.Date(stringr::str_sub(files[x], start = -14, end = -5)))) %>%
bind_rows()

Desired output would look something like this:

   var1       date
1     1 2022-11-30
2     2 2022-11-30
3     2 2022-11-30
4     1 2022-11-30
5     2 2022-11-30
6     2 2022-12-01
7     1 2022-12-01
8     2 2022-12-01
9     1 2022-12-01
10    2 2022-12-01

CodePudding user response:

We may use as.Date on the files and convert it to Date class. Then loop over the files, read with readRDS, cbind the 'dates' in Map and rbind the list elements

dates <-  as.Date(files, format = "file_%Y-%m-%d.rds")
do.call(rbind, Map(cbind, lapply(files, readRDS), dates = dates))

Or if we want to use tidyverse

library(purrr)
library(dplyr)
map2_dfr(files, dates, ~ readRDS(.x) %>%
          mutate(dates = .y))

In the OP's code, the files[x] wouldn't work because x is not an index, it is the list element i.e. the output from readRDS and there is no information about the files in the x. Instead, we can do this once within the single lapply

lapply(files, function(x)      
   readRDS(x) %>%
    mutate(date = as.Date(stringr::str_sub(x, start = -14, end = -5)))) %>%
   bind_rows
  •  Tags:  
  • r
  • Related