Home > Mobile >  How to combine rows in R to get most complete data in single row?
How to combine rows in R to get most complete data in single row?

Time:09-28

I have a large data frame that looks like this:

#a<- c('Bill', 'Bill', 'Jean', 'Jean', 'Jean', 'Louis')
#b<- c("A", "A", "B", NA, "B", NA)
#c<- c(0, 1, 0, 0, 0, 1)
#d<- c(1, 0, 1, 1, 0, 1)
#e<- c(1, 1, 0, 0, 1, 1)
#f<- data.frame(a, b, c, d, e)
#colnames(f)<- c("name", "letter", "red", "blue", "green")
 
   name letter red blue green
1  Bill      A   0    1     1
2  Bill      A   1    0     1
3  Jean      B   0    1     0
4  Jean   <NA>   0    1     0
5  Jean      B   0    0     1
6 Louis   <NA>   1    1     1

I would like to combine the rows so that each individual (i.e., Bill, Jean, and Louis) are only represented by one row. However, I would like to have the information in that one row be as complete as possible. I would like it to look like the following:

   name letter red blue green
1  Bill      A   1    1     1
2  Jean      B   0    1     1
3 Louis   <NA>   1    1     1

My only thought on how to achieve this so far involves a long loop where I check a bunch of conditions before deciding which element of each row to keep, but I was wondering if there's a faster way to do this with some preexisting function? There's no pattern to the missingness in the rows to exploit. Any guidance appreciated.

CodePudding user response:

You can use summarise:

library(dplyr)
f %>% 
  group_by(name) %>% 
  summarise(across(everything(), max, na.rm = T))
  name  letter   red  blue green
  <chr> <chr>  <dbl> <dbl> <dbl>
1 Bill  A          1     1     1
2 Jean  B          0     1     1
3 Louis NA         1     1     1

CodePudding user response:

We may do

library(data.table)
setDT(f)[, lapply(.SD, max, na.rm = TRUE), name]

-output

     name letter   red  blue green
   <char> <char> <int> <int> <int>
1:   Bill      A     1     1     1
2:   Jean      B     0     1     1
3:  Louis   <NA>     1     1     1

CodePudding user response:

Base R:

aggregate(. ~ name, f, FUN = max, na.rm = TRUE, na.action = na.pass)
# Warning in FUN(X[[i]], ...) : no non-missing arguments, returning NA
#    name letter red blue green
# 1  Bill      A   1    1     1
# 2  Jean      B   0    1     1
# 3 Louis   <NA>   1    1     1

The warning can be safely ignored here; if you want to suppress it, then perhaps FUN = function(z) suppressWarnings(max(z, na.rm = TRUE)).

  • Related