How do I count the ocurrence of a data point in a data frame?-CodePudding

I have this table and i want to retain and count only the id in which the string A and D are most represented. For example, A and D are most represented in the id "abc" than in the "hil" Id.

string	id	start	end
A	abc	0	1
A	abc	2	3
B	efg	1	3
A	hil	5	6
A	abc	6	7
D	abc	7	8
D	abc	1	2
D	hil	3	4

How can I obtain the id in which those strings are most represented?

CodePudding user response：

You can use this code:

df %>% 
  filter(string == "A" | string == "D") %>%
  group_by(id) %>%
  count(id) %>%
  arrange() %>%
  ungroup() %>%
  slice(1)

Output:

# A tibble: 1 × 2
  id        n
  <chr> <int>
1 abc       5

CodePudding user response：

In base R, you can get the most common id for each string like this:

apply(table(df$id, df$string), 2, function(x) {
   rownames(table(df$id, df$string))[which.max(x)] })
#>     A     B     D 
#>  "abc" "efg" "abc"