I have this table and i want to retain and count only the id in which the string A and D are most represented. For example, A and D are most represented in the id "abc" than in the "hil" Id.
string | id | start | end |
---|---|---|---|
A | abc | 0 | 1 |
A | abc | 2 | 3 |
B | efg | 1 | 3 |
A | hil | 5 | 6 |
A | abc | 6 | 7 |
D | abc | 7 | 8 |
D | abc | 1 | 2 |
D | hil | 3 | 4 |
How can I obtain the id in which those strings are most represented?
CodePudding user response:
You can use this code:
df %>%
filter(string == "A" | string == "D") %>%
group_by(id) %>%
count(id) %>%
arrange() %>%
ungroup() %>%
slice(1)
Output:
# A tibble: 1 × 2
id n
<chr> <int>
1 abc 5
CodePudding user response:
In base R, you can get the most common id
for each string
like this:
apply(table(df$id, df$string), 2, function(x) {
rownames(table(df$id, df$string))[which.max(x)] })
#> A B D
#> "abc" "efg" "abc"