I have data as follows:
library(data.table)
dat <- structure(list(year2006 = c("1110", "1110", "1110", "1110", "1120",
"1120", "1120", "1120"), group2006 = c("1", "2", "3", "4", "1",
"2", "3", "4"), min2006 = c("1.35", "2", "3.7",
"4.25", "5.6", "4.45", "3.09", "1.13"),
year2007 = c("1110", "1110", "1110", "1110", "1120", "1120",
"1120", "1120"), group2007 = c("1", "2", "3", "4", "1",
"2", "3", "4"), min2007 = c("5", "5.05", "5",
"1.59", "2.3", "3", "4.05", "5.16"
)), row.names = c(NA, -8L), class = c("data.table", "data.frame"
))
dat
year2006 group2006 min2006 year2007 group2007 min2007
1: 1110 1 1.35 1110 1 5
2: 1110 2 2 1110 2 5.05
3: 1110 3 3.7 1110 3 5
4: 1110 4 4.25 1110 4 1.59
5: 1120 1 5.6 1120 1 2.3
6: 1120 2 4.45 1120 2 3
7: 1120 3 3.09 1120 3 4.05
8: 1120 4 1.13 1120 4 5.16
What I would like to do, is to create a list of the numbers in min200x
, per category in year200x
.
Desired output:
cat year2006 year2007
1: 1110 c("1.35", "2", "3.7", "4.25") c("5", "5.05", "5", "1.59")
2: 1120 c("5.6", "4.45", "3.09", "1.13") c("2.3", "3", "4.05", "5.16")
I thought I could do something like:
setDT(dat)[, cat := list(min2006), by=year2006]
But that does not work (it just puts the min2006
item in a new colum cat
). And even if it did, it would only provide a solution for the year 2006. How should I go about this?
CodePudding user response:
I'm not sure why your columns in your test data are all character but the columns in your desired output are numeric. Also, you ask for a list of numbers by group but your expected output shows a vector.
Nevertheless, here's a tidyverse solution that creates list columns.
library(tidyverse)
x <- dat %>%
mutate(across(everything(), as.numeric)) %>%
group_by(year2006) %>%
select(year2006, starts_with("min")) %>%
summarise(across(everything(), lst))
x
# A tibble: 2 × 3
year2006 min2006 min2007
<dbl> <named list> <named list>
1 1110 <dbl [4]> <dbl [4]>
2 1120 <dbl [4]> <dbl [4]>
and, for example,
x$min2006
$min2006
[1] 1.35 2.00 3.70 4.25
$min2006
[1] 5.60 4.45 3.09 1.13
If your inputs are actually numeric, you can lose the mutate
.
Edit
... and to get the correct name for the grouping column, you can add %>% rename(cat=year2006)
to the pipe. Apologies for the omission.
CodePudding user response:
a similar approach
data.table
library(data.table)
COLS <- grep(names(df), pattern = "^min", value = TRUE)
setDT(df)[, lapply(.SD, list), .SDcol = COLS, by = year2006]
#> year2006 min2006 min2007
#> 1: 1110 1.35,2,3.7,4.25 5,5.05,5,1.59
#> 2: 1120 5.6,4.45,3.09,1.13 2.3,3,4.05,5.16
Created on 2022-05-31 by the reprex package (v2.0.1)
CodePudding user response:
Here is also a base R solution,
l1 <- lapply(split.default(dat, gsub('\\D ', '', names(dat))), function(i)
aggregate(as.matrix(i[3]) ~ as.matrix(i[1]), i, list))
do.call(cbind, l1)[-3]
# year2006 2006.min2006 2007.min2007
#1 1110 1.35, 2, 3.7, 4.25 5, 5.05, 5, 1.59
#2 1120 5.6, 4.45, 3.09, 1.13 2.3, 3, 4.05, 5.16