I have several dataframes (dataframe_1
, dataframe_2
...) that I want to loop in order to execute the same functions over all the dataframes. These functions are:
- Select specific columns:
dataframe_1 <- dataframe_1[, c("Column_1", "Column_2")]
- Rename the columns:
dataframe_1 <- rename(dtaframe_1, New_Name_for_Column_1 = Column_1)
- Create new columns. For example, by using the
ifelse()
function:
dataframe_1$Column_3 <- ifelse(dataframe_1$Column_1 = 5, 1, 0)
I have proven the code with some dataframes individually without errors.
However, if I execute the following loop:
list_dataframes = list(dataframe_1, dataframe_2)
for (dataframe in 1:length(list_dataframes)){
dataframe <- dataframe[, c("Column_1", "Column_2")]
dataframe <- rename(dtaframe, New_Name_for_Column_1 = Column_1)
dataframe$Column_3 <- ifelse(dataframe$Column_1 = 5, 1, 0)
}
The following error arises:
Error in dataframe[, c("Column_1", "Column_2", :
incorrect number of dimensions
(All dataframes have the same column names.)
Any idea?
Thanks!
CodePudding user response:
the code for (dataframe in 1:length(list_dataframes))
creates a vector of numbers c(1,2)
in which the value of one value at a time is stored in a variable named dataframe
. This iteration variable is scalar i.e. it has 1 dimension and a length of 1. This is why you can not subset doing dataframe[, c("Column_1", "Column_2")]
Do this instead: list_dataframes[[dataframe]][, c("Column_1", "Column_2")]
CodePudding user response:
You could try to iterate over dataframes using purrr::map_dfr()
, e.g.
list_dataframes = list(dataframe_1, dataframe_2)
library(dplyr)
library(purrr)
list_dataframes %>%
map_dfr(~.x %>%
select(Column_1, Column_2) %>%
rename(New_Name_for_Column_1 = Column_1) %>%
mutate(Column3= ifelse(Column_1 == 5, 1, 0)))
CodePudding user response:
You are not iterating over the list of dataframes, but rather over a sequence 1:length(list_dataframes)
. Consider the following for illustration:
a = list("a", "b")
for (i in a){print(i)}
for (i in 1:length(a)){print(i)}
In your code, you need to explicitly access the list elements like this:
list_dataframes = list(dataframe_1, dataframe_2)
for (df_number in 1:length(list_dataframes)){
list_dataframes[[df_number]] <- list_dataframes[[df_number]][, c("Column_1", "Column_2")]
list_dataframes[[df_number]] <- rename(list_dataframes[[df_number]], New_Name_for_Column_1 = Column_1)
list_dataframes[[df_number]]$Column_3 <- ifelse(list_dataframes[[df_number]]$Column_1 = 5, 1, 0)
}