I would like to calculate the sum of each flower in each year in R. Below is an example of how the table looks (Table 1) and what I want the outcome to be (Table 2). I know how to do the code calculation in a long table format but I am not sure how to do it in a wide table format. Note: I am using package: dplyr
(Table 1)
flower | 1902 | 1950 | 2010 | 2012 | 2021 |
---|---|---|---|---|---|
lily | 23 | 0 | 0 | 8 | 5 |
rose | 50 | 60 | 5 | 16 | 0 |
daisy | 30 | 7 | 10 | 2 | 0 |
I need to calculate the sum for each flower in each year. The end result should give me:
(Table 2)
flower | 1902 | 1950 | 2010 | 2012 | 2021 |
---|---|---|---|---|---|
lily | 23 | 23 | 23 | 31 | 36 |
rose | 50 | 110 | 115 | 131 | 131 |
daisy | 30 | 37 | 47 | 49 | 49 |
CodePudding user response:
One option involving dplyr
and purrr
might be:
dat %>%
mutate(pmap_dfr(across(-1), ~ cumsum(c(...))))
flower X1902 X1950 X2010 X2012 X2021
1 lily 23 23 23 31 36
2 rose 50 110 115 131 131
3 daisy 30 37 47 49 49
CodePudding user response:
Using rowCumsums
from matrixStats
library(matrixStats)
df1[-1] <- rowCumsums(as.matrix(df1[-1]))
-output
df1
flower X1902 X1950 X2010 X2012 X2021
1 lily 23 23 23 31 36
2 rose 50 110 115 131 131
3 daisy 30 37 47 49 49
CodePudding user response:
Here is one way of getting your expected result:
Your data frame :
dat <- structure(list(flower = c("lily", "rose", "daisy"), X1902 = c(23L,
50L, 30L), X1950 = c(0L, 60L, 7L), X2010 = c(0L, 5L, 10L), X2012 = c(8L,
16L, 2L), X2021 = c(5L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-3L))
Apply a function that calculate the cumulative sums and apply to each row of the data at column 2 to 6:
dat[1:nrow(dat), 2:6] <- t(apply(dat[1:nrow(dat), 2:6], 1, function(x) cumsum(c(x))))
# The result
dat
flower X1902 X1950 X2010 X2012 X2021
1 lily 23 23 23 31 36
2 rose 50 110 115 131 131
3 daisy 30 37 47 49 49
@benson23 has kindly suggested the following simpler code to get the same result:
dat[, 2:6] <- t(apply(dat[,2:6], 1, cumsum))
flower X1902 X1950 X2010 X2012 X2021
1 lily 23 23 23 31 36
2 rose 50 110 115 131 131
3 daisy 30 37 47 49 49
CodePudding user response:
You can use apply
with cumsum
, plus a little bit of re-formatting.
setNames(as.data.frame(cbind(df[, 1], t(apply(df[, -1], 1, cumsum)))), colnames(df))
flower X1902 X1950 X2010 X2012 X2021
1 lily 23 23 23 31 36
2 rose 50 110 115 131 131
3 daisy 30 37 47 49 49
Data
df <- structure(list(flower = c("lily", "rose", "daisy"), X1902 = c(23L,
50L, 30L), X1950 = c(0L, 60L, 7L), X2010 = c(0L, 5L, 10L), X2012 = c(8L,
16L, 2L), X2021 = c(5L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-3L))
CodePudding user response:
Here is an alternative using pivoting:
library(dplyr)
library(tidyr)
dat %>%
pivot_longer(-flower) %>%
group_by(flower) %>%
mutate(value = cumsum(value)) %>%
pivot_wider() %>%
ungroup()
flower X1902 X1950 X2010 X2012 X2021
<chr> <int> <int> <int> <int> <int>
1 lily 23 23 23 31 36
2 rose 50 110 115 131 131
3 daisy 30 37 47 49 49