I have a dataframe:
Number Superclass Class Subclass
<dbl> <chr> <chr> <chr>
1 3 NA Class: Benzene and substituted derivatives NA
2 3 Superclass: Benzenoids NA NA
3 4 Superclass: Painkiller NA NA
I'd like to flatten the dataframe and merge up so that I have Superclass, class and subclass on the same row:
Number Superclass Class Subclass
<dbl> <chr> <chr> <chr>
1 3 Superclass: Benzenoids Class: Benzene and substituted derivatives NA
2 4 Superclass: Painkiller NA NA
I've tried
df%>%
group_by(Number) %>%
summarise_all(na.omit)
but it only includes rows where all three classes are present, and removes any that only have a superclass or super and class
CodePudding user response:
After grouping by 'Number' summarise
across
(_all/_at/_if
are deprecated in favor of across
) the rest of the columns (everything()
), check if
all
values are NA
, then get the first
element or else paste
the non-NA elements with toString
(comma separated values)
library(dplyr)
df%>%
group_by(Number) %>%
summarise(across(everything(), ~ if(all(is.na(.x))) first(.x)
else toString(.x[complete.cases(.x)])))
-output
# A tibble: 2 × 4
Number Superclass Class Subclass
<int> <chr> <chr> <lgl>
1 3 Superclass: Benzenoids Class: Benzene and substituted derivatives NA
2 4 Superclass: Painkiller <NA> NA
data
df <- structure(list(Number = c(3L, 3L, 4L), Superclass = c(NA, "Superclass: Benzenoids",
"Superclass: Painkiller"), Class = c("Class: Benzene and substituted derivatives",
NA, NA), Subclass = c(NA, NA, NA)), class = "data.frame", row.names = c("1",
"2", "3"))