Home > Software engineering >  create list from columns of data table expression
create list from columns of data table expression

Time:11-24

Consider the following dt:

dt <- data.table(a=c(1,1,2,3),b=c(4,5,6,4))

That looks like that:

> dt
   a b
1: 1 4
2: 1 5
3: 2 6
4: 3 4

I'm here aggregating each column by it's unique values and then counting how many uniquye values each column has:

 > dt[,lapply(.SD,function(agg) dt[,.N,by=agg])]

   a.agg a.N b.agg b.N
1:     1   2     4   2
2:     2   1     5   1
3:     3   1     6   1

So 1 appears twice in dt and thus a.N is 2, the same logic goes on for the other values.

But the problem is if this transformations of the original datatable have different dimensions at the end, things will get recycled.

For example this dt:

dt <- data.table(a=c(1,1,2,3,7),b=c(4,5,6,4,4))

> dt[,lapply(.SD,function(agg) dt[,.N,by=agg])]

   a.agg a.N b.agg b.N
1:     1   2     4   3
2:     2   1     5   1
3:     3   1     6   1
4:     7   1     4   3
Warning message:
In as.data.table.list(jval, .named = NULL) :
  Item 2 has 3 rows but longest item has 4; recycled with remainder.

That is no longer the right answer because b.N should have now only 3 rows and things(vector) got recycled.

This is why I would like to transform the expression dt[,lapply(.SD,function(agg) dt[,.N,by=agg])] in a list with different dimensions, with the name of items in the list being the name of the columns in the new transformed dt.

A sketch of what I mean is:

newlist
$a.agg
1 2 3 7
$a.N
2 1 1 1
$b.agg
4 5 6 4
$b.N
3 1 1

Or even better solution would be to get a datatable with a track of the columns on another column:

    dt_final
   agg N column
    1 2 a
    2 1 a
    3 1 a
    7 1 a
    4 3 b
    5 1 b
    6 1 b

CodePudding user response:

Get the data in long format and then aggregate by group.

library(data.table)

dt_long <- melt(dt, measure.vars = c('a', 'b'))
dt_long[, .N, .(variable, value)]

#   variable value N
#1:        a     1 2
#2:        a     2 1
#3:        a     3 1
#4:        a     7 1
#5:        b     4 3
#6:        b     5 1
#7:        b     6 1

In tidyverse -

library(dplyr)
library(tidyr)

dt %>%
  pivot_longer(cols = everything()) %>%
  count(name, value)
  • Related