Home > Enterprise >  Get the latest updated value from last non-na column
Get the latest updated value from last non-na column

Time:09-28

I have a dataframe like this

df <- data.frame(name1 = c("a" , "a", "a", "a", "c", "c", "c", "c"),
                 name2 = c(NA,"a","a",NA, NA, "c", "c", NA),
                 name3 = c(NA, "b", "b", NA, NA, "d","d",NA))

Then, I did make a new column based on some conditions

library(tidyverse)
df %>% mutate(name4 = ifelse(!is.na(name3), name3, name1))

    name1 name2 name3 name4
1     a  <NA>  <NA>     a
2     a     a     b     b
3     a     a     b     b
4     a  <NA>  <NA>     a
5     c  <NA>  <NA>     c
6     c    c     d      d
7     c    c     d      d
8     c  <NA>  <NA>     c   

I would like to replace a, c by b, d in name4, respectively without calling the character i.e a, b. (Making another column also a good option right?)

Any suggestions for this?

Desired output

    name1 name2 name3 name4
1     a  <NA>  <NA>     b
2     a     a     b     b
3     a     a     b     b
4     a  <NA>  <NA>     b
5     c  <NA>  <NA>     d
6     c    c     d      d
7     c    c     d      d
8     c  <NA>  <NA>     d   

CodePudding user response:

Here are two possible answers:

df <- data.frame(name1 = c("a" , "a", "a", "a", "c", "c", "c", "c"),
                 name2 = c(NA,"a","a",NA, NA, "c", "c", NA),
                 name3 = c(NA, "b", "b", NA, NA, "d","d",NA))
library(tidyverse)
df %>% mutate(name4 = ifelse(!is.na(name3), name3, name1), 
              name4=sub('a','b', sub('c','d',name4)))
#>   name1 name2 name3 name4
#> 1     a  <NA>  <NA>     b
#> 2     a     a     b     b
#> 3     a     a     b     b
#> 4     a  <NA>  <NA>     b
#> 5     c  <NA>  <NA>     d
#> 6     c     c     d     d
#> 7     c     c     d     d
#> 8     c  <NA>  <NA>     d
df %>% mutate(name4 = ifelse(!is.na(name3), name3, name1), 
              name4=c('a'='b','c'='d','b'='b','d'='d')[name4])
#>   name1 name2 name3 name4
#> 1     a  <NA>  <NA>     b
#> 2     a     a     b     b
#> 3     a     a     b     b
#> 4     a  <NA>  <NA>     b
#> 5     c  <NA>  <NA>     d
#> 6     c     c     d     d
#> 7     c     c     d     d
#> 8     c  <NA>  <NA>     d

Created on 2021-09-28 by the reprex package (v2.0.1)

CodePudding user response:

fill the NAs, then use coalesce from right to left, getting the latest name for name4 column:

df %>% 
  group_by(name1) %>% 
  fill(name2, name3, .direction = "downup") %>% 
  mutate(name4 = coalesce(name3, name2, name1))

## A tibble: 8 x 4
## Groups:   name1 [2]
#  name1 name2 name3 name4
#  <chr> <chr> <chr> <chr>
#1 a     a     b     b    
#2 a     a     b     b    
#3 a     a     b     b    
#4 a     a     b     b    
#5 c     c     d     d    
#6 c     c     d     d    
#7 c     c     d     d    
#8 c     c     d     d    
  • Related