I have a sample dataset:
df <- data.frame(category = c("A", "A", "B", "C", "C", "D", "E", "C", "E", "A", "B", "C", "B", "B", "B", "D", "D", "D", "D", "B"), year = c(1, 2, 1, 2, 3, 2, 3, 1, 3, 2, 1, 1, 2, 1, 2, 3, 1, 2, 3, 1))
and would like to create a cross-tabulation of year
and category
such that only the 3 most frequent categories are in the table and also sorted by total number of occurences:
1 2 3
B 4 2 0
D 1 2 2
C 2 1 1
Using something like
df %>%
add_count(category) %>%
filter(n %in% tail(sort(unique(n)),3)) %>%
arrange(desc(n)) %>% {table(.$category, .$year)}
will filter for the three most occurring categories but leave the table unsorted
1 2 3
B 4 2 0
C 2 1 1
D 1 2 2
CodePudding user response:
This should give you what you want.
# Make a table
df.t <- table(df)
# Order by top occurrences (sum over margin 1)
df.t <- df.t[order(apply(df.t, 1, sum), decreasing=TRUE),]
# Keep top 3 results
df.t <- df.t[1:3,]
Output:
year
category 1 2 3
B 4 2 0
D 1 2 2
C 2 1 1
CodePudding user response:
It's not elegent solution using base R but it works
result <- as.data.frame.matrix(table(df))
result$sum <- rowSums(result)
result <- result[order(-result$sum),]
result <- result[1:3,]
result$sum <- NULL
1 2 3
B 4 2 0
D 1 2 2
C 2 1 1
CodePudding user response:
You'd want to arrange by the rowsums after creating table. If you want to stay (more) within tidyverse
, e.g.:
df |>
janitor::tabyl(category, year) |>
arrange(desc(rowSums(across(where(is.numeric))))) |>
head(3)
Here with janitor::tabyl()
, but you could also use dplyr::tally()
and tidyr::pivot_longer()
directly or do df |> table() |> as.data.frame.matrix()
like @Adamm.