Home > Back-end >  dplyr & tibble - conditional sum of two rows based on column value
dplyr & tibble - conditional sum of two rows based on column value

Time:09-27

Given a tibble looking as the one below, I'm trying to use the Tidyverse to perform a conditional sum based on the value of Item in each of the two Fields. Specifically, both for foo and bar, I'd like to sum the value corresponding to item a to that of item b and then remove the row of the former. The result I'm looking for is exemplified in table 2.

data <- tibble(Field = rep(c("foo", "bar"), each=4),
               Item = rep(c("a", "b", "c", "d"), 2),
               Value = runif(8))
# table 1                          # table 2
| Field | Item |   Value   |       | Field | Item |   Value   |
|-------|------|-----------|       |-------|------|-----------|
|  foo  |  a   | 0.8167347 |       |  foo  |  b   | 0.9583989 | <== 0.8167347   0.1416642
|  foo  |  b   | 0.1416642 |       |  foo  |  c   | 0.7054814 |
|  foo  |  c   | 0.7054814 |       |  foo  |  d   | 0.1196948 |
|  foo  |  d   | 0.1196948 |       |--------------------------|
|--------------------------|       |  bar  |  b   | 0.6177568 | <== 0.3604500   0.2573068
|  bar  |  a   | 0.3604500 |       |  bar  |  c   | 0.7003040 |
|  bar  |  b   | 0.2573068 |       |  bar  |  d   | 0.8131556 |
|  bar  |  c   | 0.7003040 |
|  bar  |  d   | 0.8131556 |

So far I didn't manage to get anywhere near the expected result. I know how to use dplyr's grouping function to isolate items belonging to either one of the two fields, but I have no idea how to select the value of a and sum it to b after the grouping has been performed.

CodePudding user response:

You can change Item, to receive b when it is equal a, then summarise

library(dplyr)

Data

set.seed(123)
data <- tibble(Field = rep(c("foo", "bar"), each=4),
               Item = rep(c("a", "b", "c", "d"), 2),
               Value = runif(8))

# A tibble: 8 x 3
Field Item   Value
<chr> <chr>  <dbl>
1 foo   a     0.288 
2 foo   b     0.788 
3 foo   c     0.409 
4 foo   d     0.883 
5 bar   a     0.940 
6 bar   b     0.0456
7 bar   c     0.528 
8 bar   d     0.892

Result

data %>% 
  mutate(Item = if_else(Item == "a","b",Item)) %>% 
  group_by(Field,Item) %>% 
  summarise(Value = sum(Value,na.rm = TRUE)) %>% 
  ungroup()

# A tibble: 6 x 3
Field Item  Value
<chr> <chr> <dbl>
1 bar   b     0.986
2 bar   c     0.528
3 bar   d     0.892
4 foo   b     1.08 
5 foo   c     0.409
6 foo   d     0.883

CodePudding user response:

Replace the 'a' with 'b' in 'Item' column using case_when, use both 'Field' and 'item' as grouping columns and get the sum of "Value" in summarise

library(dplyr)
data %>%
    group_by(Field, Item = case_when(Item == 'a' ~ 'b', TRUE ~ Item)) %>% 
    summarise(Value = sum(Value, na.rm = TRUE), .groups= 'drop')

-output

# A tibble: 6 × 3
  Field Item  Value
  <chr> <chr> <dbl>
1 bar   b     0.618
2 bar   c     0.700
3 bar   d     0.813
4 foo   b     0.958
5 foo   c     0.705
6 foo   d     0.120

data

data <- structure(list(Field = c("foo", "foo", "foo", "foo", "bar", "bar", 
"bar", "bar"), Item = c("a", "b", "c", "d", "a", "b", "c", "d"
), Value = c(0.8167347, 0.1416642, 0.7054814, 0.1196948, 0.36045, 
0.2573068, 0.700304, 0.8131556)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))
  • Related