I have data like so:
name1 <- c("a", NA, "b")
name2 <- c(NA, "c", "d")
mydf <- cbind.data.frame(name1, name2)
I want to create a variable which takes only the non-NA values from the two columns, and in the case of both names being different and non-NA, it only keeps the first one. The result therefore should be "a", "c", "b" (keeps a from the first row, c from the second, and b from the third). Would be good if it works for more than 2 columns.
CodePudding user response:
If you are looking for a tidy approach, you may use coalesce()
, where you can either specify a set of columns or use the full dataset.
I extended your example to demonstrate use of multiple columns and remaining NAs.
library(tidyverse)
name1 <- c("a", NA, "b", NA, NA)
name2 <- c(NA, "c", "d", NA, NA)
name3 <- c(letters[5:8], NA)
name4 <- rep(NA, 5)
mydf <- cbind.data.frame(name1, name2, name3, name4)
mydf
#> name1 name2 name3 name4
#> 1 a <NA> e NA
#> 2 <NA> c f NA
#> 3 b d g NA
#> 4 <NA> <NA> h NA
#> 5 <NA> <NA> <NA> NA
mydf %>%
as_tibble() %>%
# specify columns explicitly
mutate(new1 = coalesce(name1, name2, name3)) %>%
# or use all column in given order
mutate(new2 = coalesce(!!!syms(colnames(.))))
#> # A tibble: 5 × 6
#> name1 name2 name3 name4 new1 new2
#> <chr> <chr> <chr> <lgl> <chr> <chr>
#> 1 a <NA> e NA a a
#> 2 <NA> c f NA c c
#> 3 b d g NA b b
#> 4 <NA> <NA> h NA h h
#> 5 <NA> <NA> <NA> NA <NA> <NA>
Created on 2022-08-23 by the reprex package (v2.0.1)
CodePudding user response:
You can try something like this:
library(dplyr)
library(tidyr)
mydf %>%
# paste columns together, skipping NAs
unite(., col = "mycol", name1, name2, na.rm=TRUE, sep = "", remove = F) %>%
# get the only first element of the new column
mutate(mycol = substr(mycol,1,1))
mycol name1 name2
1 a a <NA>
2 c <NA> c
3 b b d