Home > Software design >  Add a column to dataframes in a list based on the existence of other columns in R
Add a column to dataframes in a list based on the existence of other columns in R

Time:01-24

I am attempting to add a new column to all the dataframes in a list that I have (long list containing ~200 dataframes), based on the existence of columns in these dataframes. Using modified and unmodified versions of the iris dataset as an example, I am trying to give each dataframe a new column called "species_fixed". the rules I am trying to follow are:

  1. If the column "Species" exists in the dataframe, add the information from that Species column for the new column "species_fixed".
  2. If the column "sp" exists in the dataframe, add the information from that sp column for the new column "species_fixed".
  3. If neither of these column names exists, make a species_fixed column that is all NAs.

Here was my attempt:

library(dplyr)

#Making a couple dataframes with various structures:

iris_1 <- iris %>% rename(sp = Species)
iris_2 <- iris %>% select(Sepal.Length, Sepal.Width)
iris_3 <- iris %>% mutate(species_2  = Species)

#Making them into a list:

iris_list <- list(iris, iris_1, iris_2, iris_3)

#Attempting to use lapply:

iris_list_fixed <- lapply(iris_list, function(q){
  species_fixed = mutate(ifelse('Species' %in% names(q), Species, ifelse(
'sp' %in% names(q), sp, "NA"))
})

I figure this must require some combination of lapply(), mutate(), ifelse() and potentially other functions, but I can't quite seem to land it.

CodePudding user response:

Here is a working example using your own example:

library(dplyr)

#Making a couple dataframes with various structures:

iris_1 <- iris %>% rename(sp = Species)
iris_2 <- iris %>% select(Sepal.Length, Sepal.Width)
iris_3 <- iris %>% mutate(species_2  = Species)

#Making them into a list:

iris_list <- list(iris, iris_1, iris_2, iris_3)

#Attempting to use lapply:

iris_list_fixed  <- lapply(iris_list, function(df){
  df <- df %>%
    rowwise() %>%
    mutate(species_fixed  = ifelse('Species' %in% names(df), as.character(Species), 
                                    ifelse('sp' %in% names(df), as.character(sp), "NA")
                                   )
           )
})

CodePudding user response:

Broadly similar to the other answer but might be tidier to define a named function and assign a new column directly (outside of calls to mutate):

library(dplyr)

add_fixed <- function(df) {
  if ("Species" %in% names(df)) df$species_fixed <- df$Species
  else if ("sp" %in% names(df)) df$species_fixed <- df$sp
  else df$species_fixed <- NA_character_
  
  df
}

iris_species_fixed <- lapply(iris_list, add_fixed)

CodePudding user response:

I wonder if you could use the %||% infix. If the first value (column name) is not present, try the second, and if not present, then use NA.

library(purrr)

map(
  iris_list,
  \(x) {
    x$species_fixed <- x$Species %||% x$sp %||% NA
    x
  } 
)
  • Related