In R dataframe, consolidate 5 columns with similar names into a single dataframe column-CodePudding

Given the following dataframe:

zed = data.frame(Title = c('a', NA, NA, NA, NA, NA), TITLE = c(NA, 'b', NA, NA, NA, NA), "Title/Description" = c(NA, NA, 'c', 'd', NA, NA), title = c(NA, NA, NA, NA, 'e', 'f'))

We'd like to consolidate everything down into one Title column. We can assume that in each row, there will only be 1 value that is not NA, and we are hunting for that value to add into the Title column. We can consolidate this down via the following approach:

yah <- zed %>% 
  dplyr::mutate(Title = ifelse(!is.na(Title), Title,
                               ifelse(!is.na(`Title.Description`), `Title.Description`,
                                      ifelse(!is.na(TITLE), TITLE,
                                             ifelse(!is.na(title), title, ""))))) %>%
  dplyr::select(-Title.Description, -TITLE, -title)

however this approach is messy, and becomes even messier if the dataframe has 10 title columns. this approach also breaks if a column is not present in the data:

# there's no TITLE2 --> ERROR
yah <- zed %>% 
  dplyr::mutate(Title = ifelse(!is.na(Title), Title,
                               ifelse(!is.na(`Title.Description`), `Title.Description`,
                                      ifelse(!is.na(TITLE2), TITLE2,
                                             ifelse(!is.na(title), title, ""))))) %>%
  dplyr::select(-Title.Description, -TITLE2, -title)

any recommendations on how to (a) eliminate the nasty ifelse() nesting and (b) not throw an error if one of the columns is omitted.

CodePudding user response：

dplyr has the coalesce function that takes the first non-missing value:

yah <- zed %>%
  mutate(Title = coalesce(
      Title, Title.Description, TITLE2, title, ""
    )
  )

This will still break if one of the columns doesn't exist. I'd recommend identifying any missing columns and setting them to NA before running this.

CodePudding user response：

Another option is to select only the columns that contain the the word title (regardless of case), then use coalesce. I also give the new column a different name, so that I could remove all the columns that contain title, then rename it.

library(tidyverse)

yah <- zed %>%
  mutate(temp = coalesce(!!!select(., contains("title")))) %>% 
  select(-contains("title")) %>% 
  rename(Title = temp)

Output

yah

  Title
1     a
2     b
3     c
4     d
5     e
6     f

Another option is to use max with c_across:

zed %>%
  rowwise() %>%
  mutate(temp = max(c_across(contains("title")), na.rm = TRUE)) %>% 
  select(-contains("title")) %>% 
  rename(Title = temp)

Or another option is to use reduce with pmax

zed %>% 
  mutate(temp = reduce(select(., contains("title")), pmax, na.rm = TRUE)) %>% 
  select(-contains("title")) %>% 
  rename(Title = temp)