Home > Enterprise >  Loop over several dataframes to do several actions in R
Loop over several dataframes to do several actions in R

Time:05-23

I have several dataframes (dataframe_1, dataframe_2...) that I want to loop in order to execute the same functions over all the dataframes. These functions are:

  • Select specific columns:
dataframe_1 <- dataframe_1[, c("Column_1", "Column_2")]

  • Rename the columns:
dataframe_1 <- rename(dtaframe_1, New_Name_for_Column_1 = Column_1)
  • Create new columns. For example, by using the ifelse() function:
dataframe_1$Column_3 <- ifelse(dataframe_1$Column_1 = 5, 1, 0)

I have proven the code with some dataframes individually without errors.

However, if I execute the following loop:

list_dataframes = list(dataframe_1, dataframe_2)

for (dataframe in 1:length(list_dataframes)){
 dataframe <- dataframe[, c("Column_1", "Column_2")]
 dataframe <- rename(dtaframe, New_Name_for_Column_1 = Column_1)
 dataframe$Column_3 <- ifelse(dataframe$Column_1 = 5, 1, 0)
}

The following error arises:

Error in dataframe[, c("Column_1", "Column_2",  : 
  incorrect number of dimensions

(All dataframes have the same column names.)

Any idea?

Thanks!

CodePudding user response:

the code for (dataframe in 1:length(list_dataframes)) creates a vector of numbers c(1,2) in which the value of one value at a time is stored in a variable named dataframe. This iteration variable is scalar i.e. it has 1 dimension and a length of 1. This is why you can not subset doing dataframe[, c("Column_1", "Column_2")] Do this instead: list_dataframes[[dataframe]][, c("Column_1", "Column_2")]

CodePudding user response:

You could try to iterate over dataframes using purrr::map_dfr(), e.g.

list_dataframes = list(dataframe_1, dataframe_2)

library(dplyr)
library(purrr)

list_dataframes %>% 
  map_dfr(~.x %>% 
            select(Column_1, Column_2) %>% 
            rename(New_Name_for_Column_1 = Column_1) %>% 
            mutate(Column3= ifelse(Column_1 == 5, 1, 0)))

CodePudding user response:

You are not iterating over the list of dataframes, but rather over a sequence 1:length(list_dataframes). Consider the following for illustration:

a = list("a", "b")
for (i in a){print(i)}
for (i in 1:length(a)){print(i)}

In your code, you need to explicitly access the list elements like this:

list_dataframes = list(dataframe_1, dataframe_2)

for (df_number in 1:length(list_dataframes)){
  list_dataframes[[df_number]] <- list_dataframes[[df_number]][, c("Column_1", "Column_2")]
  list_dataframes[[df_number]] <- rename(list_dataframes[[df_number]], New_Name_for_Column_1 = Column_1)
  list_dataframes[[df_number]]$Column_3 <- ifelse(list_dataframes[[df_number]]$Column_1 = 5, 1, 0)
}
  • Related