How to collapse redundant rows together to get rid of mirrored NAs in two columns?-CodePudding

I'm modifying this toy df from this question, which is similar to mine but different enough that its answer has left me slightly confused.

df <- data.frame(id1 = c("a" , "NA", "NA", "c"),
                 id2 = c(NA,"a","a",NA),
                 id3 = c("a", "a", "e", "e"),
                 n1 = c(2,2,3,3),
                 n2 = c(2,2,1,1),
                 n3 = c(0,0,3,3),
                 n4 = c(0,0,2,2))

This produces a dataframe looking like this:

id1 id2 id3 n1 n2 n3 n4
a   NA  a   2  2  0  0
NA  a   a   2  2  0  0
NA  a   e   3  1  3  2
c   NA  e   3  1  3  2

Aside from id1 and id2, the first two rows and the last two rows are identical. I'm trying to fill in the blanks to make them completely identical, so I can then apply distinct() so that the now-duplicated rows disappear, resulting in a dataframe like this:

id1 id2 id3 n1 n2 n3 n4
a   a  a   2  2  0  0
c   a  e   3  1  3  2

Is there any way to accomplish this (preferably a tidyverse solution)? I'm basically trying to collapse all my data's redundancies.

CodePudding user response：

Perhaps something like this?

df %>% 
  group_by(id3, n1, n2, n3, n4) %>% 
  summarise(id1 = na.omit(id1),
            id2 = na.omit(id2)) %>% 
  ungroup() %>% 
  select(id1,id2,id3,n1,n2,n3,n4)

output

# A tibble: 2 × 7
  id1   id2   id3   n1    n2    n3    n4   
  <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 a     a     a     2     2     0     0    
2 c     a     e     3     1     3     2

This solution is very specific to this scenario. It would not work if you had multiple id1s per group for example.

CodePudding user response：

Another possible solution where I first created an index to group on:

df <- data.frame(id1 = c("a" , "NA", "NA", "c"),
                 id2 = c(NA,"a","a",NA),
                 id3 = c("a", "a", "e", "e"),
                 n1 = c(2,2,3,3),
                 n2 = c(2,2,1,1),
                 n3 = c(0,0,3,3),
                 n4 = c(0,0,2,2))

library(dplyr)
df %>%
  mutate(index = rep(seq_len(2), each=2)) %>%
  group_by(index) %>%
  arrange(id1) %>%
  summarise(across(everything(), funs(first(.[!is.na(.)])))) %>%
  select(-index)
#> # A tibble: 2 × 7
#>   id1_first id2_first id3_first n1_first n2_first n3_first n4_first
#>   <chr>     <chr>     <chr>        <dbl>    <dbl>    <dbl>    <dbl>
#> 1 a         a         a                2        2        0        0
#> 2 c         a         e                3        1        3        2

^{Created on 2022-07-09 by the reprex package (v2.0.1)}

CodePudding user response：

Another possible solution:

library(tidyverse)

df %>% 
  group_by(id3, across(n1:n4)) %>% 
  fill(id1:id2, .direction = "updown") %>% 
  ungroup %>% 
  distinct

#> # A tibble: 2 × 7
#>   id1   id2   id3      n1    n2    n3    n4
#>   <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a     a     a         2     2     0     0
#> 2 c     a     e         3     1     3     2