Home > OS >  Create a variable with only non NA values in R from multiple columns
Create a variable with only non NA values in R from multiple columns

Time:08-24

I have data like so:

name1 <- c("a", NA, "b")
name2 <- c(NA, "c", "d")

mydf <- cbind.data.frame(name1, name2)

I want to create a variable which takes only the non-NA values from the two columns, and in the case of both names being different and non-NA, it only keeps the first one. The result therefore should be "a", "c", "b" (keeps a from the first row, c from the second, and b from the third). Would be good if it works for more than 2 columns.

CodePudding user response:

If you are looking for a tidy approach, you may use coalesce(), where you can either specify a set of columns or use the full dataset.

I extended your example to demonstrate use of multiple columns and remaining NAs.

library(tidyverse)
name1 <- c("a", NA, "b", NA, NA)
name2 <- c(NA, "c", "d", NA, NA)
name3 <- c(letters[5:8], NA)
name4 <- rep(NA, 5)

mydf <- cbind.data.frame(name1, name2, name3, name4)
mydf
#>   name1 name2 name3 name4
#> 1     a  <NA>     e    NA
#> 2  <NA>     c     f    NA
#> 3     b     d     g    NA
#> 4  <NA>  <NA>     h    NA
#> 5  <NA>  <NA>  <NA>    NA

mydf %>% 
  as_tibble() %>% 
  # specify columns explicitly
  mutate(new1 = coalesce(name1, name2, name3)) %>% 
  # or use all column in given order
  mutate(new2 = coalesce(!!!syms(colnames(.))))

#> # A tibble: 5 × 6
#>   name1 name2 name3 name4 new1  new2 
#>   <chr> <chr> <chr> <lgl> <chr> <chr>
#> 1 a     <NA>  e     NA    a     a    
#> 2 <NA>  c     f     NA    c     c    
#> 3 b     d     g     NA    b     b    
#> 4 <NA>  <NA>  h     NA    h     h    
#> 5 <NA>  <NA>  <NA>  NA    <NA>  <NA>

Created on 2022-08-23 by the reprex package (v2.0.1)

CodePudding user response:

You can try something like this:

library(dplyr)
library(tidyr)

mydf %>%
  # paste columns together, skipping NAs
  unite(., col = "mycol",  name1, name2, na.rm=TRUE, sep = "", remove = F) %>%
  # get the only first element of the new column
  mutate(mycol = substr(mycol,1,1))

  mycol name1 name2
1     a     a  <NA>
2     c  <NA>     c
3     b     b     d
  • Related