I'm modifying this toy df from this question, which is similar to mine but different enough that its answer has left me slightly confused.
df <- data.frame(id1 = c("a" , "NA", "NA", "c"),
id2 = c(NA,"a","a",NA),
id3 = c("a", "a", "e", "e"),
n1 = c(2,2,3,3),
n2 = c(2,2,1,1),
n3 = c(0,0,3,3),
n4 = c(0,0,2,2))
This produces a dataframe looking like this:
id1 id2 id3 n1 n2 n3 n4
a NA a 2 2 0 0
NA a a 2 2 0 0
NA a e 3 1 3 2
c NA e 3 1 3 2
Aside from id1 and id2, the first two rows and the last two rows are identical. I'm trying to fill in the blanks to make them completely identical, so I can then apply distinct() so that the now-duplicated rows disappear, resulting in a dataframe like this:
id1 id2 id3 n1 n2 n3 n4
a a a 2 2 0 0
c a e 3 1 3 2
Is there any way to accomplish this (preferably a tidyverse solution)? I'm basically trying to collapse all my data's redundancies.
CodePudding user response:
Perhaps something like this?
df %>%
group_by(id3, n1, n2, n3, n4) %>%
summarise(id1 = na.omit(id1),
id2 = na.omit(id2)) %>%
ungroup() %>%
select(id1,id2,id3,n1,n2,n3,n4)
output
# A tibble: 2 × 7
id1 id2 id3 n1 n2 n3 n4
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 a a a 2 2 0 0
2 c a e 3 1 3 2
This solution is very specific to this scenario. It would not work if you had multiple id1s per group for example.
CodePudding user response:
Another possible solution where I first created an index to group on:
df <- data.frame(id1 = c("a" , "NA", "NA", "c"),
id2 = c(NA,"a","a",NA),
id3 = c("a", "a", "e", "e"),
n1 = c(2,2,3,3),
n2 = c(2,2,1,1),
n3 = c(0,0,3,3),
n4 = c(0,0,2,2))
library(dplyr)
df %>%
mutate(index = rep(seq_len(2), each=2)) %>%
group_by(index) %>%
arrange(id1) %>%
summarise(across(everything(), funs(first(.[!is.na(.)])))) %>%
select(-index)
#> # A tibble: 2 × 7
#> id1_first id2_first id3_first n1_first n2_first n3_first n4_first
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a a a 2 2 0 0
#> 2 c a e 3 1 3 2
Created on 2022-07-09 by the reprex package (v2.0.1)
CodePudding user response:
Another possible solution:
library(tidyverse)
df %>%
group_by(id3, across(n1:n4)) %>%
fill(id1:id2, .direction = "updown") %>%
ungroup %>%
distinct
#> # A tibble: 2 × 7
#> id1 id2 id3 n1 n2 n3 n4
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a a a 2 2 0 0
#> 2 c a e 3 1 3 2