Home > Back-end >  Frequency count for multiple columns with same values
Frequency count for multiple columns with same values

Time:10-25

I'd like to make a frequency count individually for multiple columns with same possible values. The idea is to keep all columns from original data table, just adding a new one for levels and aggregating. Here is an example of input data:

foo <- data.table(a = c(1,3,2,3,3), b = c(2,3,3,1,1), c = c(3,1,2,3,2))
#   a b c
#1: 1 2 3
#2: 3 3 1
#3: 2 3 2
#4: 3 1 3
#5: 3 1 2

And desired output:

data.table(levels = 1:3, a = c(1,1,3), b = c(2,1,2), c = c(1,2,2))
#   levels a b c
#1:      1 1 2 1
#2:      2 1 1 2
#3:      3 3 2 2

Thanks for helping !

CodePudding user response:

We may use

library(data.table)
dcast(melt(foo)[, .N, .(variable, levels = value)],
     levels ~ variable, value.var = 'N')

-output

Key: <levels>
   levels     a     b     c
    <num> <int> <int> <int>
1:      1     1     2     1
2:      2     1     1     2
3:      3     3     2     2

Or using base R

table(stack(foo))
      ind
values a b c
     1 1 2 1
     2 1 1 2
     3 3 2 2

CodePudding user response:

You could also use recast from reshape2:

reshape2::recast(foo, value~variable)
# No id variables; using all as measure variables
# Aggregation function missing: defaulting to length
  value a b c
1     1 1 2 1
2     2 1 1 2
3     3 3 2 2

or even

reshape2::recast(foo, value~variable, length)

CodePudding user response:

foo |> 
  melt() |> 
  dcast(value ~ variable, fun.aggregate = length)
#    value a b c
# 1:     1 1 2 1
# 2:     2 1 1 2
# 3:     3 3 2 2

CodePudding user response:

Here is an option using purrr and dplyr from the tidyverse:

library(purrr)
library(dplyr)

foo %>% 
  imap(~ as.data.frame(table(.x, dnn = "levels"), responseName = .y)) %>% 
  reduce(left_join, by = "levels")

Alternatively, you could use the pivot functions from tidyr:

library(dplyr)
library(tidyr)

foo %>% 
  pivot_longer(everything(),
               values_to = "levels") %>% 
  count(name, levels) %>% 
  pivot_wider(id_cols = levels,
              names_from = name,
              values_from = n)
  • Related