I have a large data frame that looks like this:
#a<- c('Bill', 'Bill', 'Jean', 'Jean', 'Jean', 'Louis')
#b<- c("A", "A", "B", NA, "B", NA)
#c<- c(0, 1, 0, 0, 0, 1)
#d<- c(1, 0, 1, 1, 0, 1)
#e<- c(1, 1, 0, 0, 1, 1)
#f<- data.frame(a, b, c, d, e)
#colnames(f)<- c("name", "letter", "red", "blue", "green")
name letter red blue green
1 Bill A 0 1 1
2 Bill A 1 0 1
3 Jean B 0 1 0
4 Jean <NA> 0 1 0
5 Jean B 0 0 1
6 Louis <NA> 1 1 1
I would like to combine the rows so that each individual (i.e., Bill, Jean, and Louis) are only represented by one row. However, I would like to have the information in that one row be as complete as possible. I would like it to look like the following:
name letter red blue green
1 Bill A 1 1 1
2 Jean B 0 1 1
3 Louis <NA> 1 1 1
My only thought on how to achieve this so far involves a long loop where I check a bunch of conditions before deciding which element of each row to keep, but I was wondering if there's a faster way to do this with some preexisting function? There's no pattern to the missingness in the rows to exploit. Any guidance appreciated.
CodePudding user response:
You can use summarise
:
library(dplyr)
f %>%
group_by(name) %>%
summarise(across(everything(), max, na.rm = T))
name letter red blue green
<chr> <chr> <dbl> <dbl> <dbl>
1 Bill A 1 1 1
2 Jean B 0 1 1
3 Louis NA 1 1 1
CodePudding user response:
We may do
library(data.table)
setDT(f)[, lapply(.SD, max, na.rm = TRUE), name]
-output
name letter red blue green
<char> <char> <int> <int> <int>
1: Bill A 1 1 1
2: Jean B 0 1 1
3: Louis <NA> 1 1 1
CodePudding user response:
Base R:
aggregate(. ~ name, f, FUN = max, na.rm = TRUE, na.action = na.pass)
# Warning in FUN(X[[i]], ...) : no non-missing arguments, returning NA
# name letter red blue green
# 1 Bill A 1 1 1
# 2 Jean B 0 1 1
# 3 Louis <NA> 1 1 1
The warning can be safely ignored here; if you want to suppress it, then perhaps FUN = function(z) suppressWarnings(max(z, na.rm = TRUE))
.