I am attempting to add a new column to all the dataframes in a list that I have (long list containing ~200 dataframes), based on the existence of columns in these dataframes. Using modified and unmodified versions of the iris dataset as an example, I am trying to give each dataframe a new column called "species_fixed". the rules I am trying to follow are:
- If the column "Species" exists in the dataframe, add the information from that Species column for the new column "species_fixed".
- If the column "sp" exists in the dataframe, add the information from that sp column for the new column "species_fixed".
- If neither of these column names exists, make a species_fixed column that is all NAs.
Here was my attempt:
library(dplyr)
#Making a couple dataframes with various structures:
iris_1 <- iris %>% rename(sp = Species)
iris_2 <- iris %>% select(Sepal.Length, Sepal.Width)
iris_3 <- iris %>% mutate(species_2 = Species)
#Making them into a list:
iris_list <- list(iris, iris_1, iris_2, iris_3)
#Attempting to use lapply:
iris_list_fixed <- lapply(iris_list, function(q){
species_fixed = mutate(ifelse('Species' %in% names(q), Species, ifelse(
'sp' %in% names(q), sp, "NA"))
})
I figure this must require some combination of lapply(), mutate(), ifelse() and potentially other functions, but I can't quite seem to land it.
CodePudding user response:
Here is a working example using your own example:
library(dplyr)
#Making a couple dataframes with various structures:
iris_1 <- iris %>% rename(sp = Species)
iris_2 <- iris %>% select(Sepal.Length, Sepal.Width)
iris_3 <- iris %>% mutate(species_2 = Species)
#Making them into a list:
iris_list <- list(iris, iris_1, iris_2, iris_3)
#Attempting to use lapply:
iris_list_fixed <- lapply(iris_list, function(df){
df <- df %>%
rowwise() %>%
mutate(species_fixed = ifelse('Species' %in% names(df), as.character(Species),
ifelse('sp' %in% names(df), as.character(sp), "NA")
)
)
})
CodePudding user response:
Broadly similar to the other answer but might be tidier to define a named function and assign a new column directly (outside of calls to mutate
):
library(dplyr)
add_fixed <- function(df) {
if ("Species" %in% names(df)) df$species_fixed <- df$Species
else if ("sp" %in% names(df)) df$species_fixed <- df$sp
else df$species_fixed <- NA_character_
df
}
iris_species_fixed <- lapply(iris_list, add_fixed)
CodePudding user response:
I wonder if you could use the %||%
infix. If the first value (column name) is not present, try the second, and if not present, then use NA
.
library(purrr)
map(
iris_list,
\(x) {
x$species_fixed <- x$Species %||% x$sp %||% NA
x
}
)