I have a df
as below
df <- data.frame(col1 = c("a", "a", "b",
"b", "c", "c"),
col2 = c("x1", "x1.1", "x2", "x2.1", "x3", "x3.1"),
col3 = c(1, NA, 2, NA, 3, NA),
col4 = c(NA, 1, NA, 2, NA, 3))
df
col1 col2 col3 col4
1 a x1 1 NA
2 a x1.1 NA 1
3 b x2 2 NA
4 b x2.1 NA 2
5 c x3 3 NA
6 c x3.1 NA 3
I want to merge rows that have the same letter in column col1
and filter rows in column col2
by telling them col2 %in% c(x1,x1.1) & col2 %in% c(x2,x2.1) & col3 %in% (x3,x3.1)
, simulatenously.
My desired output would be:
col1 col2 col3 col4
1 a x1 1 1
2 b x2 2 2
3 c x3 3 3
One solution from my side is to call if x == "x1"
, then col4
will be filled by values assosicated with x == "x1.1"
Any suggestions for this to group_by
col1
? Thank you in advance!
CodePudding user response:
Not sure if I'm missing something, but for your example data the approach below works:
library(dplyr)
df %>%
mutate(col2 = sub("\\.[0-9] $", "", col2),
col3 = coalesce(col3, col4),
col4 = coalesce(col4, col3)) %>%
distinct()
#> col1 col2 col3 col4
#> 1 a x1 1 1
#> 2 b x2 2 2
#> 3 c x3 3 3
Created on 2022-11-29 by the reprex package (v2.0.1)
CodePudding user response:
After extracting only the first to characters of col2
, you can use coalesce
to bring col3
and col4
together.
Then replacing NA
by data from the coalesced column and using distinct
at the end will lead to the desired output.
library(tidyverse)
df <- data.frame(col1 = c("a", "a", "b",
"b", "c", "c"),
col2 = c("x1", "x1.1", "x2", "x2.1", "x3", "x3.1"),
col3 = c(1, NA, 2, NA, 3, NA),
col4 = c(NA, 1, NA, 2, NA, 3))
df |>
mutate(col2 = substring(col2, 1,2)) |>
mutate(mycol = coalesce(col3, col4)) |>
mutate(col3 = if_else(is.na(col3), mycol, col3),
col4 = ifelse(is.na(col4), mycol, col4)) |>
select(-c(mycol)) |>
distinct()
#> col1 col2 col3 col4
#> 1 a x1 1 1
#> 2 b x2 2 2
#> 3 c x3 3 3