I would like to collapse multiples row conditions using tidyverse
and here is my example
df <- data.frame(value = c(2,2,2,2,1,1,1,1,1,1),
name1 = c("a", "a", "b", "b", 'c', "d", "e", NA, NA, NA),
name2 = c("x", "x", "x", "x", "x", "x", "y", NA, NA, NA))
I would like to collapse rows saying that name1
is the same associated with name2
is the same, then those rows would be collapse into single row. Any suggestions for me?
My desired output like
value name1 name2
1 2 a x
2 2 b x
3 1 c x
4 1 d x
5 1 e y
6 1 <NA> <NA>
7 1 <NA> <NA>
8 1 <NA> <NA>
CodePudding user response:
May be this helps
library(dplyr)
df %>%
filter(!duplicated(across(everything()))|if_any(everything(), is.na))
-output
value name1 name2
1 2 a x
2 2 b x
3 1 c x
4 1 d x
5 1 e y
6 1 <NA> <NA>
7 1 <NA> <NA>
8 1 <NA> <NA>
If it is based on selected number of columns
df %>%
filter(!duplicated(across(c(name1, name2)))|if_any(c(name1, name2), is.na))
Or in base R
df[!duplicated(df)|rowSums(is.na(df)) > 0,]
value name1 name2
1 2 a x
3 2 b x
5 1 c x
6 1 d x
7 1 e y
8 1 <NA> <NA>
9 1 <NA> <NA>
10 1 <NA> <NA>
CodePudding user response:
Here is an dplyr
alternative using a helper
column to prepare to apply distinct()
library(dplyr)
df %>%
mutate(helper = paste0(name1, name2),
helper = ifelse(is.na(name1) | is.na(name2),
paste0(helper, row_number()), helper)
) %>%
distinct(helper, .keep_all = TRUE) %>%
select(-helper)
Outcome:
value name1 name2
1 2 a x
2 2 b x
3 1 c x
4 1 d x
5 1 e y
6 1 <NA> <NA>
7 1 <NA> <NA>
8 1 <NA> <NA>
CodePudding user response:
Another tidyverse option could look as follows.
library(dplyr)
df %>%
filter(if_any(name1:name2, ~ !is.na(.))) %>%
distinct() %>%
bind_rows(filter(df, if_any(name1:name2, is.na)))
# value name1 name2
# 1 2 a x
# 2 2 b x
# 3 1 c x
# 4 1 d x
# 5 1 e y
# 6 1 <NA> <NA>
# 7 1 <NA> <NA>
# 8 1 <NA> <NA>