I have a data frame with number of values for every year:
V1 11 14 15 17 19 20 21 22 24 26 28 29 3 31 32 35 37 39 47 5 51 54 55 6 63 66 7 9 91
2008 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
2013 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2014 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
2015 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 2 0 1 0 0 0 0 0 0 0
2016 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2017 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
2018 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0
2019 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0
2020 0 0 0 0 1 2 1 0 0 0 0 0 0 2 1 1 0 0 0 0 1 0 0 0 1 0 2 1 0
2021 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 3 0 1
2022 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0
and I need to sum the numbers by the range of header values, so the result should look like this:
V1 0-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
2008 0 0 0 0 0 0 0 0 0 0
2011 0 0 0 0 0 0 1 0 0 0
2013 1 0 0 0 0 0 0 0 0 0
2014 1 0 0 0 0 0 0 0 0 0
2015 2 0 1 0 1 1 0 0 0 0
2016 1 0 2 1 0 0 0 0 0 0
2017 1 0 2 0 0 0 0 0 0 0
2018 1 0 3 2 0 1 1 0 0 0
2019 1 1 0 0 0 1 0 0 0 0
2020 3 1 1 4 0 1 1 0 0 0
2021 4 2 1 1 0 0 0 0 0 1
2022 1 2 0 0 0 1 0 0 0 0
I tried this code, but it sums the numbers by the second digit of header value, not by the first one as I need
xtabs(values~., transform(cbind(df[1],stack(df[-1])), ind = sub('.','',ind)))
How should I get the result table I need, plese? Thanks in advance.
CodePudding user response:
Convert the data to long format
library(reshape2)
d.m = melt(dat, id.vars=1)
Now group the values:
d.m$group = cut(d.m$variable, seq(0,100,10))
aggregate
dm2 = aggregate(d.m$value, by=list(V1=d.m$X, group=d.m$group), FUN=sum)
And unmelt, with dcast
:
dcast(dm2, V1~group)
V1 (0,10] (10,20] (20,30] (30,40] (40,50] (50,60] (60,70] (90,100]
1 2008 0 1 0 0 0 0 0 0
2 2011 0 0 0 0 0 0 1 0
3 2013 1 0 0 0 0 0 0 0
4 2014 1 0 0 0 0 0 0 0
5 2015 2 0 1 0 1 1 0 0
6 2016 1 0 2 1 0 0 0 0
7 2017 1 0 2 0 0 0 0 0
8 2018 1 0 3 2 0 1 1 0
9 2019 1 1 0 0 0 1 0 0
10 2020 3 3 1 4 0 1 1 0
11 2021 4 2 1 1 0 0 0 1
12 2022 1 2 0 0 0 1 0 0
There's probably a cleaner way to do it with dplyr
CodePudding user response:
You can do:
library(tidyverse)
d <- read.table(text = " V1 11 14 15 17 19 20 21 22 24 26 28 29 3 31 32 35 37 39 47 5 51 54 55 6 63 66 7 9 91
2008 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
2013 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2014 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
2015 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 2 0 1 0 0 0 0 0 0 0
2016 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2017 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
2018 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0
2019 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0
2020 0 0 0 0 1 2 1 0 0 0 0 0 0 2 1 1 0 0 0 0 1 0 0 0 1 0 2 1 0
2021 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 3 0 1
2022 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0",
header = TRUE, check.names = FALSE)
d <- d %>%
pivot_longer(cols = -V1, names_to = "category", values_to = "val") %>%
mutate(new_cat = cut(as.numeric(category), seq(0, 100, by = 10)))
xtabs(val ~ V1 new_cat, d)