Home > Mobile >  How to collapse redundant rows together to get rid of mirrored NAs in two columns?
How to collapse redundant rows together to get rid of mirrored NAs in two columns?

Time:07-10

I'm modifying this toy df from this question, which is similar to mine but different enough that its answer has left me slightly confused.

df <- data.frame(id1 = c("a" , "NA", "NA", "c"),
                 id2 = c(NA,"a","a",NA),
                 id3 = c("a", "a", "e", "e"),
                 n1 = c(2,2,3,3),
                 n2 = c(2,2,1,1),
                 n3 = c(0,0,3,3),
                 n4 = c(0,0,2,2))

This produces a dataframe looking like this:

id1 id2 id3 n1 n2 n3 n4
a   NA  a   2  2  0  0
NA  a   a   2  2  0  0
NA  a   e   3  1  3  2
c   NA  e   3  1  3  2

Aside from id1 and id2, the first two rows and the last two rows are identical. I'm trying to fill in the blanks to make them completely identical, so I can then apply distinct() so that the now-duplicated rows disappear, resulting in a dataframe like this:

id1 id2 id3 n1 n2 n3 n4
a   a  a   2  2  0  0
c   a  e   3  1  3  2

Is there any way to accomplish this (preferably a tidyverse solution)? I'm basically trying to collapse all my data's redundancies.

CodePudding user response:

Perhaps something like this?

df %>% 
  group_by(id3, n1, n2, n3, n4) %>% 
  summarise(id1 = na.omit(id1),
            id2 = na.omit(id2)) %>% 
  ungroup() %>% 
  select(id1,id2,id3,n1,n2,n3,n4)

output

# A tibble: 2 × 7
  id1   id2   id3   n1    n2    n3    n4   
  <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 a     a     a     2     2     0     0    
2 c     a     e     3     1     3     2

This solution is very specific to this scenario. It would not work if you had multiple id1s per group for example.

CodePudding user response:

Another possible solution where I first created an index to group on:

df <- data.frame(id1 = c("a" , "NA", "NA", "c"),
                 id2 = c(NA,"a","a",NA),
                 id3 = c("a", "a", "e", "e"),
                 n1 = c(2,2,3,3),
                 n2 = c(2,2,1,1),
                 n3 = c(0,0,3,3),
                 n4 = c(0,0,2,2))

library(dplyr)
df %>%
  mutate(index = rep(seq_len(2), each=2)) %>%
  group_by(index) %>%
  arrange(id1) %>%
  summarise(across(everything(), funs(first(.[!is.na(.)])))) %>%
  select(-index)
#> # A tibble: 2 × 7
#>   id1_first id2_first id3_first n1_first n2_first n3_first n4_first
#>   <chr>     <chr>     <chr>        <dbl>    <dbl>    <dbl>    <dbl>
#> 1 a         a         a                2        2        0        0
#> 2 c         a         e                3        1        3        2

Created on 2022-07-09 by the reprex package (v2.0.1)

CodePudding user response:

Another possible solution:

library(tidyverse)

df %>% 
  group_by(id3, across(n1:n4)) %>% 
  fill(id1:id2, .direction = "updown") %>% 
  ungroup %>% 
  distinct

#> # A tibble: 2 × 7
#>   id1   id2   id3      n1    n2    n3    n4
#>   <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a     a     a         2     2     0     0
#> 2 c     a     e         3     1     3     2
  • Related