Making lists by group-CodePudding

I have data as follows:

library(data.table)
dat <- structure(list(year2006 = c("1110", "1110", "1110", "1110", "1120", 
"1120", "1120", "1120"), group2006 = c("1", "2", "3", "4", "1", 
"2", "3", "4"), min2006 = c("1.35", "2", "3.7", 
"4.25", "5.6", "4.45", "3.09", "1.13"), 
    year2007 = c("1110", "1110", "1110", "1110", "1120", "1120", 
    "1120", "1120"), group2007 = c("1", "2", "3", "4", "1", 
    "2", "3", "4"), min2007 = c("5", "5.05", "5", 
    "1.59", "2.3", "3", "4.05", "5.16"
    )), row.names = c(NA, -8L), class = c("data.table", "data.frame"
))
dat

   year2006  group2006 min2006 year2007  group2007 min2007
1:     1110          1    1.35     1110          1       5
2:     1110          2       2     1110          2    5.05
3:     1110          3     3.7     1110          3       5
4:     1110          4    4.25     1110          4    1.59
5:     1120          1     5.6     1120          1     2.3
6:     1120          2    4.45     1120          2       3
7:     1120          3    3.09     1120          3    4.05
8:     1120          4    1.13     1120          4    5.16

What I would like to do, is to create a list of the numbers in min200x, per category in year200x.

Desired output:

   cat       year2006                         year2007
1:     1110  c("1.35", "2", "3.7", "4.25")    c("5", "5.05", "5", "1.59") 
2:     1120  c("5.6", "4.45", "3.09", "1.13") c("2.3", "3", "4.05", "5.16")

I thought I could do something like:

setDT(dat)[, cat := list(min2006), by=year2006]

But that does not work (it just puts the min2006 item in a new colum cat). And even if it did, it would only provide a solution for the year 2006. How should I go about this?

CodePudding user response：

I'm not sure why your columns in your test data are all character but the columns in your desired output are numeric. Also, you ask for a list of numbers by group but your expected output shows a vector.

Nevertheless, here's a tidyverse solution that creates list columns.

library(tidyverse)

x <- dat %>% 
  mutate(across(everything(), as.numeric)) %>% 
  group_by(year2006) %>% 
  select(year2006, starts_with("min")) %>% 
  summarise(across(everything(), lst))
x
# A tibble: 2 × 3
  year2006 min2006      min2007     
     <dbl> <named list> <named list>
1     1110 <dbl [4]>    <dbl [4]>   
2     1120 <dbl [4]>    <dbl [4]>

and, for example,

x$min2006
$min2006
[1] 1.35 2.00 3.70 4.25

$min2006
[1] 5.60 4.45 3.09 1.13

If your inputs are actually numeric, you can lose the mutate.

Edit

... and to get the correct name for the grouping column, you can add %>% rename(cat=year2006) to the pipe. Apologies for the omission.

CodePudding user response：

a similar approach

data.table


library(data.table)
COLS <- grep(names(df), pattern = "^min", value = TRUE)

setDT(df)[, lapply(.SD, list), .SDcol = COLS, by = year2006]
#>    year2006            min2006         min2007
#> 1:     1110    1.35,2,3.7,4.25   5,5.05,5,1.59
#> 2:     1120 5.6,4.45,3.09,1.13 2.3,3,4.05,5.16

^{Created on 2022-05-31 by the reprex package (v2.0.1)}

CodePudding user response：

Here is also a base R solution,

l1 <- lapply(split.default(dat, gsub('\\D ', '', names(dat))), function(i) 
                                       aggregate(as.matrix(i[3]) ~ as.matrix(i[1]), i, list))

do.call(cbind, l1)[-3]

#  year2006          2006.min2006       2007.min2007
#1     1110    1.35, 2, 3.7, 4.25   5, 5.05, 5, 1.59
#2     1120 5.6, 4.45, 3.09, 1.13 2.3, 3, 4.05, 5.16