Given the following dataframe:
zed = data.frame(Title = c('a', NA, NA, NA, NA, NA), TITLE = c(NA, 'b', NA, NA, NA, NA), "Title/Description" = c(NA, NA, 'c', 'd', NA, NA), title = c(NA, NA, NA, NA, 'e', 'f'))
We'd like to consolidate everything down into one Title
column. We can assume that in each row, there will only be 1 value that is not NA, and we are hunting for that value to add into the Title
column. We can consolidate this down via the following approach:
yah <- zed %>%
dplyr::mutate(Title = ifelse(!is.na(Title), Title,
ifelse(!is.na(`Title.Description`), `Title.Description`,
ifelse(!is.na(TITLE), TITLE,
ifelse(!is.na(title), title, ""))))) %>%
dplyr::select(-Title.Description, -TITLE, -title)
however this approach is messy, and becomes even messier if the dataframe has 10 title columns. this approach also breaks if a column is not present in the data:
# there's no TITLE2 --> ERROR
yah <- zed %>%
dplyr::mutate(Title = ifelse(!is.na(Title), Title,
ifelse(!is.na(`Title.Description`), `Title.Description`,
ifelse(!is.na(TITLE2), TITLE2,
ifelse(!is.na(title), title, ""))))) %>%
dplyr::select(-Title.Description, -TITLE2, -title)
any recommendations on how to (a) eliminate the nasty ifelse() nesting and (b) not throw an error if one of the columns is omitted.
CodePudding user response:
dplyr
has the coalesce
function that takes the first non-missing value:
yah <- zed %>%
mutate(Title = coalesce(
Title, Title.Description, TITLE2, title, ""
)
)
This will still break if one of the columns doesn't exist. I'd recommend identifying any missing columns and setting them to NA
before running this.
CodePudding user response:
Another option is to select only the columns that contain the the word title
(regardless of case), then use coalesce
. I also give the new column a different name, so that I could remove all the columns that contain title
, then rename it.
library(tidyverse)
yah <- zed %>%
mutate(temp = coalesce(!!!select(., contains("title")))) %>%
select(-contains("title")) %>%
rename(Title = temp)
Output
yah
Title
1 a
2 b
3 c
4 d
5 e
6 f
Another option is to use max
with c_across
:
zed %>%
rowwise() %>%
mutate(temp = max(c_across(contains("title")), na.rm = TRUE)) %>%
select(-contains("title")) %>%
rename(Title = temp)
Or another option is to use reduce
with pmax
zed %>%
mutate(temp = reduce(select(., contains("title")), pmax, na.rm = TRUE)) %>%
select(-contains("title")) %>%
rename(Title = temp)