Home > Software engineering >  R: Sum of columns based on range of column name
R: Sum of columns based on range of column name

Time:09-20

I have a data frame with number of values for every year:

   V1 11 14 15 17 19 20 21 22 24 26 28 29 3 31 32 35 37 39 47 5 51 54 55 6 63 66 7 9 91
 2008  0  0  0  0  0  1  0  0  0  0  0  0 0  0  0  0  0  0  0 0  0  0  0 0  0  0 0 0  0
 2011  0  0  0  0  0  0  0  0  0  0  0  0 0  0  0  0  0  0  0 0  0  0  0 0  0  1 0 0  0
 2013  0  0  0  0  0  0  0  0  0  0  0  0 1  0  0  0  0  0  0 0  0  0  0 0  0  0 0 0  0
 2014  0  0  0  0  0  0  0  0  0  0  0  0 0  0  0  0  0  0  0 0  0  0  0 0  0  0 1 0  0
 2015  0  0  0  0  0  0  0  0  0  0  0  1 0  0  0  0  0  0  1 2  0  1  0 0  0  0 0 0  0
 2016  0  0  0  0  0  0  0  1  0  0  1  0 1  0  0  1  0  0  0 0  0  0  0 0  0  0 0 0  0
 2017  0  0  0  0  0  0  0  0  0  0  2  0 0  0  0  0  0  0  0 0  0  0  0 0  0  0 1 0  0
 2018  0  0  0  0  0  0  0  1  1  1  0  0 0  0  0  0  1  1  0 0  0  0  1 0  1  0 0 1  0
 2019  0  1  0  0  0  0  0  0  0  0  0  0 0  0  0  0  0  0  0 0  0  1  0 0  0  0 1 0  0
 2020  0  0  0  0  1  2  1  0  0  0  0  0 0  2  1  1  0  0  0 0  1  0  0 0  1  0 2 1  0
 2021  1  0  1  0  0  0  0  0  0  1  0  0 0  0  1  0  0  0  0 0  0  0  0 1  0  0 3 0  1
 2022  0  0  0  2  0  0  0  0  0  0  0  0 0  0  0  0  0  0  0 0  0  0  1 0  0  0 1 0  0

and I need to sum the numbers by the range of header values, so the result should look like this:

  V1  0-10  11-20  21-30  31-40  41-50  51-60  61-70  71-80  81-90  91-100
2008     0      0      0      0      0      0      0      0      0       0
2011     0      0      0      0      0      0      1      0      0       0
2013     1      0      0      0      0      0      0      0      0       0
2014     1      0      0      0      0      0      0      0      0       0
2015     2      0      1      0      1      1      0      0      0       0
2016     1      0      2      1      0      0      0      0      0       0
2017     1      0      2      0      0      0      0      0      0       0
2018     1      0      3      2      0      1      1      0      0       0
2019     1      1      0      0      0      1      0      0      0       0
2020     3      1      1      4      0      1      1      0      0       0
2021     4      2      1      1      0      0      0      0      0       1
2022     1      2      0      0      0      1      0      0      0       0

I tried this code, but it sums the numbers by the second digit of header value, not by the first one as I need

xtabs(values~., transform(cbind(df[1],stack(df[-1])), ind = sub('.','',ind)))

How should I get the result table I need, plese? Thanks in advance.

CodePudding user response:

Convert the data to long format

library(reshape2)
d.m = melt(dat, id.vars=1)

Now group the values:

d.m$group = cut(d.m$variable, seq(0,100,10))

aggregate

dm2 = aggregate(d.m$value, by=list(V1=d.m$X, group=d.m$group), FUN=sum)

And unmelt, with dcast:

dcast(dm2, V1~group)
     V1 (0,10] (10,20] (20,30] (30,40] (40,50] (50,60] (60,70] (90,100]
1  2008      0       1       0       0       0       0       0        0
2  2011      0       0       0       0       0       0       1        0
3  2013      1       0       0       0       0       0       0        0
4  2014      1       0       0       0       0       0       0        0
5  2015      2       0       1       0       1       1       0        0
6  2016      1       0       2       1       0       0       0        0
7  2017      1       0       2       0       0       0       0        0
8  2018      1       0       3       2       0       1       1        0
9  2019      1       1       0       0       0       1       0        0
10 2020      3       3       1       4       0       1       1        0
11 2021      4       2       1       1       0       0       0        1
12 2022      1       2       0       0       0       1       0        0

There's probably a cleaner way to do it with dplyr

CodePudding user response:

You can do:

library(tidyverse)

d <- read.table(text = " V1 11 14 15 17 19 20 21 22 24 26 28 29 3 31 32 35 37 39 47 5 51 54 55 6 63 66 7 9 91
 2008  0  0  0  0  0  1  0  0  0  0  0  0 0  0  0  0  0  0  0 0  0  0  0 0  0  0 0 0  0
 2011  0  0  0  0  0  0  0  0  0  0  0  0 0  0  0  0  0  0  0 0  0  0  0 0  0  1 0 0  0
 2013  0  0  0  0  0  0  0  0  0  0  0  0 1  0  0  0  0  0  0 0  0  0  0 0  0  0 0 0  0
 2014  0  0  0  0  0  0  0  0  0  0  0  0 0  0  0  0  0  0  0 0  0  0  0 0  0  0 1 0  0
 2015  0  0  0  0  0  0  0  0  0  0  0  1 0  0  0  0  0  0  1 2  0  1  0 0  0  0 0 0  0
 2016  0  0  0  0  0  0  0  1  0  0  1  0 1  0  0  1  0  0  0 0  0  0  0 0  0  0 0 0  0
 2017  0  0  0  0  0  0  0  0  0  0  2  0 0  0  0  0  0  0  0 0  0  0  0 0  0  0 1 0  0
 2018  0  0  0  0  0  0  0  1  1  1  0  0 0  0  0  0  1  1  0 0  0  0  1 0  1  0 0 1  0
 2019  0  1  0  0  0  0  0  0  0  0  0  0 0  0  0  0  0  0  0 0  0  1  0 0  0  0 1 0  0
 2020  0  0  0  0  1  2  1  0  0  0  0  0 0  2  1  1  0  0  0 0  1  0  0 0  1  0 2 1  0
 2021  1  0  1  0  0  0  0  0  0  1  0  0 0  0  1  0  0  0  0 0  0  0  0 1  0  0 3 0  1
 2022  0  0  0  2  0  0  0  0  0  0  0  0 0  0  0  0  0  0  0 0  0  0  1 0  0  0 1 0  0", 
            header = TRUE, check.names = FALSE)

d <- d %>%
  pivot_longer(cols = -V1, names_to = "category", values_to = "val") %>%
  mutate(new_cat = cut(as.numeric(category), seq(0, 100, by = 10)))

xtabs(val ~ V1   new_cat, d)
  •  Tags:  
  • r
  • Related