Home > Net >  Flatten rows with same identifier in R
Flatten rows with same identifier in R

Time:11-08

I have a dataframe:

   Number Superclass                                Class                                      Subclass                                      
    <dbl> <chr>                                     <chr>                                      <chr>                                         
 1      3 NA                                        Class: Benzene and substituted derivatives NA                                            
 2      3 Superclass: Benzenoids                    NA                                         NA 
 3      4 Superclass: Painkiller                    NA                                         NA

I'd like to flatten the dataframe and merge up so that I have Superclass, class and subclass on the same row:

   Number Superclass                                Class                                      Subclass                                      
    <dbl> <chr>                                     <chr>                                      <chr>                                         
 1      3 Superclass: Benzenoids                    Class: Benzene and substituted derivatives NA                                            
 2      4 Superclass: Painkiller                    NA                                         NA

I've tried

df%>%
  group_by(Number) %>%
  summarise_all(na.omit)

but it only includes rows where all three classes are present, and removes any that only have a superclass or super and class

CodePudding user response:

After grouping by 'Number' summarise across (_all/_at/_if are deprecated in favor of across) the rest of the columns (everything()), check if all values are NA, then get the first element or else paste the non-NA elements with toString (comma separated values)

library(dplyr)
df%>%
  group_by(Number) %>%
  summarise(across(everything(), ~ if(all(is.na(.x))) first(.x) 
      else toString(.x[complete.cases(.x)])))

-output

# A tibble: 2 × 4
  Number Superclass             Class                                      Subclass
   <int> <chr>                  <chr>                                      <lgl>   
1      3 Superclass: Benzenoids Class: Benzene and substituted derivatives NA      
2      4 Superclass: Painkiller <NA>                                       NA      

data

df <- structure(list(Number = c(3L, 3L, 4L), Superclass = c(NA, "Superclass: Benzenoids", 
"Superclass: Painkiller"), Class = c("Class: Benzene and substituted derivatives", 
NA, NA), Subclass = c(NA, NA, NA)), class = "data.frame", row.names = c("1", 
"2", "3"))
  • Related