collaps multiple rows with conditions-CodePudding

I would like to collapse multiples row conditions using tidyverse and here is my example

df <- data.frame(value = c(2,2,2,2,1,1,1,1,1,1),
                 name1 = c("a", "a", "b", "b", 'c', "d", "e", NA, NA, NA),
                 name2 = c("x", "x", "x", "x", "x", "x", "y", NA, NA, NA))

I would like to collapse rows saying that name1 is the same associated with name2 is the same, then those rows would be collapse into single row. Any suggestions for me?

My desired output like

value name1 name2
1      2     a     x
2      2     b     x
3      1     c     x
4      1     d     x
5      1     e     y
6      1  <NA>  <NA>
7      1  <NA>  <NA>
8     1  <NA>  <NA>

CodePudding user response：

May be this helps

library(dplyr)
df %>% 
    filter(!duplicated(across(everything()))|if_any(everything(), is.na))

-output

 value name1 name2
1     2     a     x
2     2     b     x
3     1     c     x
4     1     d     x
5     1     e     y
6     1  <NA>  <NA>
7     1  <NA>  <NA>
8     1  <NA>  <NA>

If it is based on selected number of columns

df %>%
    filter(!duplicated(across(c(name1, name2)))|if_any(c(name1, name2), is.na))

Or in base R

 df[!duplicated(df)|rowSums(is.na(df)) > 0,]
   value name1 name2
1      2     a     x
3      2     b     x
5      1     c     x
6      1     d     x
7      1     e     y
8      1  <NA>  <NA>
9      1  <NA>  <NA>
10     1  <NA>  <NA>

CodePudding user response：

Here is an dplyr alternative using a helper column to prepare to apply distinct()

library(dplyr)
df %>% 
  mutate(helper = paste0(name1, name2),
         helper = ifelse(is.na(name1) | is.na(name2), 
                         paste0(helper, row_number()), helper)
         ) %>% 
  distinct(helper, .keep_all = TRUE) %>% 
  select(-helper)

Outcome:

  value name1 name2
1     2     a     x
2     2     b     x
3     1     c     x
4     1     d     x
5     1     e     y
6     1  <NA>  <NA>
7     1  <NA>  <NA>
8     1  <NA>  <NA>

CodePudding user response：

Another tidyverse option could look as follows.

library(dplyr)

df %>%
  filter(if_any(name1:name2, ~ !is.na(.))) %>%
  distinct() %>%
  bind_rows(filter(df, if_any(name1:name2, is.na)))

#   value name1 name2
# 1     2     a     x
# 2     2     b     x
# 3     1     c     x
# 4     1     d     x
# 5     1     e     y
# 6     1  <NA>  <NA>
# 7     1  <NA>  <NA>
# 8     1  <NA>  <NA>