Home > Software engineering >  Making lists by group
Making lists by group

Time:05-31

I have data as follows:

library(data.table)
dat <- structure(list(year2006 = c("1110", "1110", "1110", "1110", "1120", 
"1120", "1120", "1120"), group2006 = c("1", "2", "3", "4", "1", 
"2", "3", "4"), min2006 = c("1.35", "2", "3.7", 
"4.25", "5.6", "4.45", "3.09", "1.13"), 
    year2007 = c("1110", "1110", "1110", "1110", "1120", "1120", 
    "1120", "1120"), group2007 = c("1", "2", "3", "4", "1", 
    "2", "3", "4"), min2007 = c("5", "5.05", "5", 
    "1.59", "2.3", "3", "4.05", "5.16"
    )), row.names = c(NA, -8L), class = c("data.table", "data.frame"
))
dat

   year2006  group2006 min2006 year2007  group2007 min2007
1:     1110          1    1.35     1110          1       5
2:     1110          2       2     1110          2    5.05
3:     1110          3     3.7     1110          3       5
4:     1110          4    4.25     1110          4    1.59
5:     1120          1     5.6     1120          1     2.3
6:     1120          2    4.45     1120          2       3
7:     1120          3    3.09     1120          3    4.05
8:     1120          4    1.13     1120          4    5.16

What I would like to do, is to create a list of the numbers in min200x, per category in year200x.

Desired output:

   cat       year2006                         year2007
1:     1110  c("1.35", "2", "3.7", "4.25")    c("5", "5.05", "5", "1.59") 
2:     1120  c("5.6", "4.45", "3.09", "1.13") c("2.3", "3", "4.05", "5.16")

I thought I could do something like:

setDT(dat)[, cat := list(min2006), by=year2006]

But that does not work (it just puts the min2006 item in a new colum cat). And even if it did, it would only provide a solution for the year 2006. How should I go about this?

CodePudding user response:

I'm not sure why your columns in your test data are all character but the columns in your desired output are numeric. Also, you ask for a list of numbers by group but your expected output shows a vector.

Nevertheless, here's a tidyverse solution that creates list columns.

library(tidyverse)

x <- dat %>% 
  mutate(across(everything(), as.numeric)) %>% 
  group_by(year2006) %>% 
  select(year2006, starts_with("min")) %>% 
  summarise(across(everything(), lst))
x
# A tibble: 2 × 3
  year2006 min2006      min2007     
     <dbl> <named list> <named list>
1     1110 <dbl [4]>    <dbl [4]>   
2     1120 <dbl [4]>    <dbl [4]>

and, for example,

x$min2006
$min2006
[1] 1.35 2.00 3.70 4.25

$min2006
[1] 5.60 4.45 3.09 1.13

If your inputs are actually numeric, you can lose the mutate.

Edit

... and to get the correct name for the grouping column, you can add %>% rename(cat=year2006) to the pipe. Apologies for the omission.

CodePudding user response:

a similar approach

data.table


library(data.table)
COLS <- grep(names(df), pattern = "^min", value = TRUE)

setDT(df)[, lapply(.SD, list), .SDcol = COLS, by = year2006]
#>    year2006            min2006         min2007
#> 1:     1110    1.35,2,3.7,4.25   5,5.05,5,1.59
#> 2:     1120 5.6,4.45,3.09,1.13 2.3,3,4.05,5.16

Created on 2022-05-31 by the reprex package (v2.0.1)

CodePudding user response:

Here is also a base R solution,

l1 <- lapply(split.default(dat, gsub('\\D ', '', names(dat))), function(i) 
                                       aggregate(as.matrix(i[3]) ~ as.matrix(i[1]), i, list))

do.call(cbind, l1)[-3]

#  year2006          2006.min2006       2007.min2007
#1     1110    1.35, 2, 3.7, 4.25   5, 5.05, 5, 1.59
#2     1120 5.6, 4.45, 3.09, 1.13 2.3, 3, 4.05, 5.16
  • Related