I have a data in this format
ColA | ColB | ColC |
---|---|---|
A | 2 | 1 |
A | 1 | 1 |
B | 3 | 2 |
B | 5 | 2 |
C | 2 | 3 |
C | 5 | 3 |
A | 1 | 1 |
A | 3 | 1 |
B | 7 | 2 |
B | 1 | 2 |
I want to get a new column with the sum of the rows of ColB, something like this:
ColA | ColB | ColC | ColD |
---|---|---|---|
A | 2 | 1 | 3 |
A | 1 | 1 | 3 |
B | 3 | 2 | 8 |
B | 5 | 2 | 8 |
C | 2 | 3 | 7 |
C | 5 | 3 | 7 |
A | 1 | 1 | 4 |
A | 3 | 1 | 4 |
B | 7 | 2 | 8 |
B | 1 | 2 | 8 |
Thanks much for your help!
I tried
df$ColD <- with(df, sum(ColB[ColC == 1]))
CodePudding user response:
It seems to me that you want ColD
to have the sum of ColB
for each consecutive group defined by the values in ColA
. In which case, we may do:
library(dplyr)
df %>%
mutate(group = data.table::rleid(ColA)) %>%
group_by(group) %>%
mutate(ColD = sum(ColB)) %>%
ungroup() %>%
select(-group)
#> # A tibble: 10 x 4
#> ColA ColB ColC ColD
#> <chr> <int> <int> <int>
#> 1 A 2 1 3
#> 2 A 1 1 3
#> 3 B 3 2 8
#> 4 B 5 2 8
#> 5 C 2 3 7
#> 6 C 5 3 7
#> 7 A 1 1 4
#> 8 A 3 1 4
#> 9 B 7 2 8
#> 10 B 1 2 8
This, at any rate, is the same as the expected output.
Created on 2023-01-16 with reprex v2.0.2
Data from question in reproducible format
df <- structure(list(ColA = c("A", "A", "B", "B", "C", "C", "A", "A",
"B", "B"), ColB = c(2L, 1L, 3L, 5L, 2L, 5L, 1L, 3L, 7L, 1L),
ColC = c(1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L)),
class = "data.frame", row.names = c(NA, -10L))
CodePudding user response:
Base R
df$ColD=ave(
df$ColB,
cumsum(c(1,abs(diff(match(df$ColA,LETTERS))))),
FUN=sum
)
ColA ColB ColC ColD
1 A 2 1 3
2 A 1 1 3
3 B 3 2 8
4 B 5 2 8
5 C 2 3 7
6 C 5 3 7
7 A 1 1 4
8 A 3 1 4
9 B 7 2 8
10 B 1 2 8
CodePudding user response:
A base
solution:
df |>
transform(ColD = ave(ColB, with(rle(ColA), rep(seq_along(values), lengths)), FUN = sum))
ColA ColB ColC ColD
1 A 2 1 3
2 A 1 1 3
3 B 3 2 8
4 B 5 2 8
5 C 2 3 7
6 C 5 3 7
7 A 1 1 4
8 A 3 1 4
9 B 7 2 8
10 B 1 2 8