Home > Software engineering >  taking sum of rows in R based on conditions
taking sum of rows in R based on conditions

Time:01-16

I have a data in this format

ColA ColB ColC
A 2 1
A 1 1
B 3 2
B 5 2
C 2 3
C 5 3
A 1 1
A 3 1
B 7 2
B 1 2

I want to get a new column with the sum of the rows of ColB, something like this:

ColA ColB ColC ColD
A 2 1 3
A 1 1 3
B 3 2 8
B 5 2 8
C 2 3 7
C 5 3 7
A 1 1 4
A 3 1 4
B 7 2 8
B 1 2 8

Thanks much for your help!

I tried

df$ColD <- with(df, sum(ColB[ColC == 1]))

CodePudding user response:

It seems to me that you want ColD to have the sum of ColB for each consecutive group defined by the values in ColA. In which case, we may do:

library(dplyr)

df %>% 
  mutate(group = data.table::rleid(ColA)) %>%
  group_by(group) %>%
  mutate(ColD = sum(ColB)) %>%
  ungroup() %>%
  select(-group)
#> # A tibble: 10 x 4
#>    ColA   ColB  ColC  ColD
#>    <chr> <int> <int> <int>
#>  1 A         2     1     3
#>  2 A         1     1     3
#>  3 B         3     2     8
#>  4 B         5     2     8
#>  5 C         2     3     7
#>  6 C         5     3     7
#>  7 A         1     1     4
#>  8 A         3     1     4
#>  9 B         7     2     8
#> 10 B         1     2     8

This, at any rate, is the same as the expected output.

Created on 2023-01-16 with reprex v2.0.2


Data from question in reproducible format

df <- structure(list(ColA = c("A", "A", "B", "B", "C", "C", "A", "A", 
"B", "B"), ColB = c(2L, 1L, 3L, 5L, 2L, 5L, 1L, 3L, 7L, 1L), 
    ColC = c(1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L)), 
class = "data.frame", row.names = c(NA, -10L))

CodePudding user response:

Base R

df$ColD=ave(
  df$ColB,
  cumsum(c(1,abs(diff(match(df$ColA,LETTERS))))),
  FUN=sum
)

   ColA ColB ColC ColD
1     A    2    1    3
2     A    1    1    3
3     B    3    2    8
4     B    5    2    8
5     C    2    3    7
6     C    5    3    7
7     A    1    1    4
8     A    3    1    4
9     B    7    2    8
10    B    1    2    8

CodePudding user response:

A base solution:

df |>
  transform(ColD = ave(ColB, with(rle(ColA), rep(seq_along(values), lengths)), FUN = sum))

   ColA ColB ColC ColD
1     A    2    1    3
2     A    1    1    3
3     B    3    2    8
4     B    5    2    8
5     C    2    3    7
6     C    5    3    7
7     A    1    1    4
8     A    3    1    4
9     B    7    2    8
10    B    1    2    8
  • Related