Aggregate rows with multiple conditions-CodePudding

I have a df as below

df <- data.frame(col1 = c("a", "a", "b",
                          "b", "c", "c"),
                 col2 = c("x1", "x1.1", "x2", "x2.1", "x3", "x3.1"),
                 col3 = c(1, NA, 2, NA, 3, NA),
                 col4 = c(NA, 1, NA, 2, NA, 3))
df
  col1 col2 col3 col4
1    a   x1    1   NA
2    a x1.1   NA    1
3    b   x2    2   NA
4    b x2.1   NA    2
5    c   x3    3   NA
6    c x3.1   NA    3

I want to merge rows that have the same letter in column col1 and filter rows in column col2 by telling them col2 %in% c(x1,x1.1) & col2 %in% c(x2,x2.1) & col3 %in% (x3,x3.1), simulatenously.

My desired output would be:

  col1 col2 col3 col4
1    a   x1    1   1
2    b   x2    2   2
3    c   x3    3   3

One solution from my side is to call if x == "x1", then col4 will be filled by values assosicated with x == "x1.1"

Any suggestions for this to group_by col1? Thank you in advance!

CodePudding user response：

Not sure if I'm missing something, but for your example data the approach below works:

library(dplyr)

df %>% 
  mutate(col2 = sub("\\.[0-9] $", "", col2),
         col3 = coalesce(col3, col4),
         col4 = coalesce(col4, col3)) %>% 
  distinct()
#>   col1 col2 col3 col4
#> 1    a   x1    1    1
#> 2    b   x2    2    2
#> 3    c   x3    3    3

^{Created on 2022-11-29 by the reprex package (v2.0.1)}

CodePudding user response：

After extracting only the first to characters of col2, you can use coalesce to bring col3 and col4 together. Then replacing NA by data from the coalesced column and using distinct at the end will lead to the desired output.

library(tidyverse)

df <- data.frame(col1 = c("a", "a", "b",
                          "b", "c", "c"),
                 col2 = c("x1", "x1.1", "x2", "x2.1", "x3", "x3.1"),
                 col3 = c(1, NA, 2, NA, 3, NA),
                 col4 = c(NA, 1, NA, 2, NA, 3))


df |> 
  mutate(col2 = substring(col2, 1,2)) |> 
  mutate(mycol = coalesce(col3, col4)) |>
  mutate(col3 = if_else(is.na(col3), mycol, col3),
         col4 = ifelse(is.na(col4), mycol, col4)) |> 
  select(-c(mycol)) |> 
  distinct()
#>   col1 col2 col3 col4
#> 1    a   x1    1    1
#> 2    b   x2    2    2
#> 3    c   x3    3    3