Home > Mobile >  Name a variable or object based on the value of another variable in R
Name a variable or object based on the value of another variable in R

Time:06-30

I read data files from a directory where I don't know the number or the name of the files. Each files a data frame (as parquet file). I can read that files. But how to name the results?

I would like to have something like a named list where the filename is the name of the element. I don't know how to do this in R. In Python I would use dictionaries like this

file_names = ['A.parquet', 'B.parquet']

all_data = {}

for fn in file_names:
    data = pd.read_parquet(fn)
    all_data[fn] = data

How can I solve this in R?

library("arrow")

file_names = c('a.parquet', 'B.parquet')

# "named vector"?
daten = c()

for (pf in file_names) {
    # name of data frame (filename without suffix)
    df_name <- strsplit(pf, ".", fixed=TRUE)[[1]][1]

    df <- arrow::read_parquet(pf)

    daten[df_name] = df
}

This doesn't work because I got this error

number of items to replace is not a multiple of replacement length

CodePudding user response:

Each arrow::read_parquet() call returns a data frame. You want to store the results of your loop using a list of data frames. In particular, you are looking for a named list.

file_names <- c('a.parquet', 'B.parquet')

## loop through files (can be replaced by a one-line `lapply` call)
daten <- list()  ## not c()
for (i in 1:length(file_names)) {
  daten[[i]] <- arrow::read_parquet(file_names[i])
}

## grab filename without suffix
names(daten) <- gsub(".parquet", "", file_names)

To access list element by name, use daten[["a"]] and daten[["B"]].


Remark: Since the length of the list is known, it is better to initialize it with a fixed length, so that the list does not grow in size during the loop.

daten <- vector("list", length(file_names))

In addition, if you know about lapply function, you can replace the loop with the following so that you don't even need to bother about list initialization.

daten <- lapply(file_names, arrow::read_parquet)

As a result, the code can be shortened to:

daten <- lapply(file_names, arrow::read_parquet)
names(daten) <- gsub(".parquet", "", file_names)

CodePudding user response:

You can used named lists like so.

You can either use the names directly

sapply(file_names, arrow::read_parquet,USE.NAMES = TRUE,simplify = FALSE)

or set them after with whatever function you want to apply

setNames(lapply(file_names, arrow::read_parquet), str_extract(file_names, '(^. )(\\.)'))

CodePudding user response:

In the tidyverse you would use purrr. This is basically the same as the lapply() or sapply() approach, but in a different ecosystem.

library(arrow)
library(purrr)

file_names = c('a.parquet', 'B.parquet')

daten <- file_names %>% 
  set_names(tools::file_path_sans_ext) %>% 
  map(read_parquet)

You would access each list item through the usual ways.

daten$a
daten$B

# or

daten[["a"]]
daten[["B"]]
  •  Tags:  
  • r
  • Related