How to get the first non-NA date for each row and add it as a new column in the data frame below in-CodePudding

I have a data frame below and i want to choose the first non missing date by each row and add it as a new column to the data frame. enter image description here

CodePudding user response：

You can use {dplyr}

data <- data.frame(x = c(NA, 1), y = 2:3)
library(dplyr, warn.conflicts = FALSE)
data %>% mutate(z = coalesce(!!!.))
#>    x y z
#> 1 NA 2 2
#> 2  1 3 1

^{Created on 2022-04-20 by the reprex package (v2.0.1)}

CodePudding user response：

A base R option:

df$dt5 <- apply(df[,-1], 1, function(z) na.omit(z)[1])
df

Output:

  id        dt1        dt2        dt3        dt4        dt5
1  1       <NA> 2002-01-11 2003-01-01 2004-01-11 2002-01-11
2  2 2001-01-01 2002-10-12 2003-10-12 2004-01-11 2001-01-01
3  3       <NA> 2001-10-12 2002-01-11 2004-11-19 2001-10-12
4  4       <NA>       <NA> 2002-10-12 2002-10-12 2002-10-12

Your data:

df <- data.frame(id = c(1,2,3,4),
                 dt1 = c(NA, "2001-01-01", NA, NA),
                 dt2 = c("2002-01-11", "2002-10-12", "2001-10-12", NA),
                 dt3 = c("2003-01-01", "2003-10-12", "2002-01-11", "2002-10-12"),
                 dt4 = c("2004-01-11", "2004-01-11", "2004-11-19", "2002-10-12"))

CodePudding user response：

We could do it with pivoting: Data from Quinten, many thanks!

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(
    -id,
  ) %>% 
  group_by(id) %>% 
  mutate(first = first(na.omit(value))) %>% 
  pivot_wider(
    names_from = name
  )

     id first      dt1        dt2        dt3        dt4       
  <dbl> <chr>      <chr>      <chr>      <chr>      <chr>     
1     1 2002-01-11 NA         2002-01-11 2003-01-01 2004-01-11
2     2 2001-01-01 2001-01-01 2002-10-12 2003-10-12 2004-01-11
3     3 2001-10-12 NA         2001-10-12 2002-01-11 2004-11-19
4     4 2002-10-12 NA         NA         2002-10-12 2002-10-12