How can I rank the first 4 group of my dataframe associated to the highest value in the count column and create a 5th group summing up the remaining groups and their associated values ?
What I did so far:
dummy_dataframe <- data.frame(group = c("A", "B", "A", "A", "C", "C", "D", "E", "F", "D","G"))
df_aggregate <- aggregate(cbind(count = group) ~ group,
data = dumy_dataframe,
FUN = function(x){NROW(x)})
df_sliced <- df_aggregate %>%
arrange(desc(count)) %>%
slice(1:4)
With the code above I get a dataframe with the 4 groups associated to the highest value but how I could have a fith group summing up the value of the missing group (E, F and G) ? For instance something like this:
group count
1 A 3
2 B 1
3 C 2
4 D 2
5 others 3
CodePudding user response:
You can run some tidyverse operations directly on your original dataframe:
library(tidyverse)
dummy_dataframe %>%
count(group) %>%
mutate(id = if_else(row_number() < 5, 1L, 2L)) %>%
group_by(id) %>%
arrange(id, -n) %>%
mutate(group = if_else(id == 2L, "others", group),
n = if_else(group == "others", sum(n), n)) %>%
ungroup() %>%
distinct() %>%
select(-id)
which gives:
# A tibble: 5 x 2
group n
<chr> <int>
1 A 3
2 C 2
3 D 2
4 B 1
5 others 3
CodePudding user response:
Short & sweet:
result<-rbind(df_aggregate[order(df_aggregate$count,decreasing = T),][c(1:4),],c("rest",sum(df_aggregate[order(df_aggregate$count,decreasing = T),][c(5:nrow(df_aggregate)),2])))
CodePudding user response:
I would go completely with the dplyr package and its possibility:
library(dplyr)
dummy_dataframe <- data.frame(group = c("A", "B", "A", "A", "C", "C", "D", "E", "F", "D","G"))
df_aggregate <- dummy_dataframe %>%
group_by(group
) %>%
summarise(count = n()
) %>%
arrange(desc(count)
)
df_top_4_groups <- df_aggregate %>%
slice(1:4)
df_others <- df_aggregate %>%
anti_join(df_top_4_groups, by = "group"
) %>%
mutate(group = "others"
) %>%
group_by(group
) %>%
summarise(count = n()
)
df_finale <- df_top_4_groups %>%
bind_rows(df_others)
df_finale
A tibble: 5 x 2
group count
<chr> <int>
1 A 3
2 C 2
3 D 2
4 B 1
5 others 3
Your use of aggregate isn't wrong - quite cool ;) - but i think using the pipe from top to bottom makes it more readable.