Home > Blockchain >  How to merge rows with duplicate ID, replacing NAs with data in the other row, and leading with data
How to merge rows with duplicate ID, replacing NAs with data in the other row, and leading with data

Time:11-23

I have a df like this:

data <- tribble(~id, ~othervar, ~it_1, ~it_2, ~it_3, ~it_4, ~it_5, ~it_6,
              "k01", "lum", "a", "b", "c", "a", NA, NA,
              "k01", "lum", NA, NA, NA, NA, "a", "d",
              "k02", "nar", "a", "b", "c", "b", NA, NA,
              "k03", "lum", "a", "b", "a", "c", NA, NA,
              "k03", "lum", "b", "b", "a", NA, "d", "e")

I want to merge rows with duplicated IDs in only one row where NAs are replaced with the information available in the other row. But where there are no-NA in both rows, the problem is to preserve any one. I´ve tried pivoting the table, but have no resources to deal with this.

i expect somthing like this:

id  othervar it_1 it_2 it_3 it_4 it_5 it_6
k01 lum      a    b    c    a    a    d
k02 nar      a    b    c    b    NA   NA
k03 lum      a    b    a    c    d    e
              

CodePudding user response:

With ifelse and summarise:

library(dplyr)
data %>% 
  group_by(id) %>% 
  summarise(across(everything(), ~ ifelse(any(complete.cases(.x)),
                                          first(.x[!is.na(.x)]),
                                          NA)))

# A tibble: 3 × 8
  id    othervar it_1  it_2  it_3  it_4  it_5  it_6 
  <chr> <chr>    <chr> <chr> <chr> <chr> <chr> <chr>
1 k01   lum      a     b     c     a     a     d    
2 k02   nar      a     b     c     b     NA    NA   
3 k03   lum      a     b     a     c     d     e    

CodePudding user response:

Without ifelse with dplyr functions only:

data %>%
group_by(id) %>%
summarise(across(everything(), 
~coalesce(.x) %>% 
`[`(!is.na(.)) %>% 
`[`(1) ))
  • Related