I'm still new to R and I could use some help. So I have a dataset that looks something like this
a <- c("a", "b", "c", "d", "a", "d")
E <- c(NA, "E", NA, "E", NA, "E")
F <- c(NA, "F", "F", "F", NA, NA)
G <- c("G", NA, "G", "G", "G", NA)
df <- data.frame (a, E, F, G)
I'm trying to find out which one of E, F, or G, occurs most per group when I group by a. My biggest issue seems to be that they are characters in three separate columns. I tried combining them into one column but it didn't work. I'm struggling to find answers after searching for hours and am now just confused at what should be an easy question I would think. Any help would be amazing. Thanks!
Edit: Sorry I'm very new to the site and am still getting the formatting down. So the correct output would ideally be something like.
a Mostcommon
- ----------
1 a "G"
2 b "E""F"
3 c "F""G"
4 d "E"
Using the example I gave. With my actual data there should only be one most common value per group.
CodePudding user response:
You could use the Modes function defined here. ie I copy oasted it over here
Modes <- function(x) {
ux <- unique(x)
tab <- tabulate(match(x, ux))
ux[tab == max(tab)]
}
Now with the modes function, do the following:
df %>%
pivot_longer(-a, values_drop_na = TRUE)%>%
group_by(a) %>%
summarize(most_common = toString(Modes(value)))
# A tibble: 4 x 2
a most_common
<chr> <chr>
1 a G
2 b E, F
3 c F, G
4 d E
CodePudding user response:
Is this what you'd like to do?
library(tidyverse)
tibble(
a = c("a", "b", "c", "d", "a", "d"),
E = c("NA", "E", "NA", "E", "NA", "E"),
F = c("NA", "F", "F", "F", "NA", "NA"),
G = c("G", "NA", "G", "G", "G", "NA")
) |>
mutate(across(E:G, ~if_else(is.na(.), 0, 1))) |>
group_by(a) |>
summarise(across(E:G, sum))
#> # A tibble: 4 × 4
#> a E F G
#> <chr> <dbl> <dbl> <dbl>
#> 1 a 0 0 2
#> 2 b 1 1 0
#> 3 c 0 1 1
#> 4 d 2 1 1
Created on 2022-05-03 by the reprex package (v2.0.1)