mydat <- data.frame(name = c("James", "James", "James", "James", "Leila", "Leila", "Cici", "Bob", "Bob", "Holly", "Topaz", "Topaz"),
code = c(123, 928, 981, 333, 981, 928, 463, 123, 928, 981, 333, 444))
> mydat
name code
1 James 123
2 James 928
3 James 981
4 James 333
5 Leila 981
6 Leila 928
7 Cici 463
8 Bob 123
9 Bob 928
10 Holly 981
11 Topaz 333
12 Topaz 444
In mydat
, each person can have multiple code
s associated with it. I want to tabulate the code
column and assign the person the code that is most common.
> rbind(table(mydat), Total = colSums(table(mydat)))
123 333 444 463 928 981
Bob 1 0 0 0 1 0
Cici 0 0 0 1 0 0
Holly 0 0 0 0 0 1
James 1 1 0 0 1 1
Leila 0 0 0 0 1 1
Topaz 0 1 1 0 0 0
Total 2 2 1 1 3 3
Bob has code
s 123 and 928. Since 928 appears more often across mydat
than 123, Bob will be assigned a code
of 928.
Cici has code
463, so Cici will be assigned a code of 463.
Holly has code
981, so Holly will be assigned a code of 981.
James has code
s 123, 333, 928, and 981. Since 928 and 981 both appear more frequently than 123 and 333, James will be assigned code
s 928 and 981.
The final output should be:
> final_mydat
name final_code
1 Bob 928
2 Cici 463
3 Holly 981
4 James 928
5 James 981
6 Leila 928
7 Leila 981
8 Topaz 333
Is there a quick way to do this in R?
CodePudding user response:
library(dplyr)
mydat %>%
add_count(code) %>%
group_by(name) %>%
slice_max(n) %>%
ungroup()
# # A tibble: 8 × 3
# name code n
# <chr> <dbl> <int>
# 1 Bob 928 3
# 2 Cici 463 1
# 3 Holly 981 3
# 4 James 928 3
# 5 James 981 3
# 6 Leila 981 3
# 7 Leila 928 3
# 8 Topaz 333 2
Add ... %>% select(-n)
if you want to get rid of the n
column.
CodePudding user response:
In base R we can do
by(transform(mydat, score=ave(code, code, FUN=length)), mydat$name, \(x) {
with(x, x[score == max(score), ])}) |> c(make.row.names=FALSE) |> do.call(what=rbind)
# name code score
# 1 Bob 928 3
# 2 Cici 463 1
# 3 Holly 981 3
# 4 James 928 3
# 5 James 981 3
# 6 Leila 981 3
# 7 Leila 928 3
# 8 Topaz 333 2