Given a tibble looking as the one below, I'm trying to use the Tidyverse to perform a conditional sum based on the value of Item
in each of the two Fields. Specifically, both for foo
and bar
, I'd like to sum the value corresponding to item a
to that of item b
and then remove the row of the former. The result I'm looking for is exemplified in table 2.
data <- tibble(Field = rep(c("foo", "bar"), each=4),
Item = rep(c("a", "b", "c", "d"), 2),
Value = runif(8))
# table 1 # table 2
| Field | Item | Value | | Field | Item | Value |
|-------|------|-----------| |-------|------|-----------|
| foo | a | 0.8167347 | | foo | b | 0.9583989 | <== 0.8167347 0.1416642
| foo | b | 0.1416642 | | foo | c | 0.7054814 |
| foo | c | 0.7054814 | | foo | d | 0.1196948 |
| foo | d | 0.1196948 | |--------------------------|
|--------------------------| | bar | b | 0.6177568 | <== 0.3604500 0.2573068
| bar | a | 0.3604500 | | bar | c | 0.7003040 |
| bar | b | 0.2573068 | | bar | d | 0.8131556 |
| bar | c | 0.7003040 |
| bar | d | 0.8131556 |
So far I didn't manage to get anywhere near the expected result. I know how to use dplyr
's grouping function to isolate items belonging to either one of the two fields, but I have no idea how to select the value of a
and sum it to b
after the grouping has been performed.
CodePudding user response:
You can change Item, to receive b
when it is equal a
, then summarise
library(dplyr)
Data
set.seed(123)
data <- tibble(Field = rep(c("foo", "bar"), each=4),
Item = rep(c("a", "b", "c", "d"), 2),
Value = runif(8))
# A tibble: 8 x 3
Field Item Value
<chr> <chr> <dbl>
1 foo a 0.288
2 foo b 0.788
3 foo c 0.409
4 foo d 0.883
5 bar a 0.940
6 bar b 0.0456
7 bar c 0.528
8 bar d 0.892
Result
data %>%
mutate(Item = if_else(Item == "a","b",Item)) %>%
group_by(Field,Item) %>%
summarise(Value = sum(Value,na.rm = TRUE)) %>%
ungroup()
# A tibble: 6 x 3
Field Item Value
<chr> <chr> <dbl>
1 bar b 0.986
2 bar c 0.528
3 bar d 0.892
4 foo b 1.08
5 foo c 0.409
6 foo d 0.883
CodePudding user response:
Replace the 'a' with 'b' in 'Item' column using case_when
, use both 'Field' and 'item' as grouping columns and get the sum
of "Value" in summarise
library(dplyr)
data %>%
group_by(Field, Item = case_when(Item == 'a' ~ 'b', TRUE ~ Item)) %>%
summarise(Value = sum(Value, na.rm = TRUE), .groups= 'drop')
-output
# A tibble: 6 × 3
Field Item Value
<chr> <chr> <dbl>
1 bar b 0.618
2 bar c 0.700
3 bar d 0.813
4 foo b 0.958
5 foo c 0.705
6 foo d 0.120
data
data <- structure(list(Field = c("foo", "foo", "foo", "foo", "bar", "bar",
"bar", "bar"), Item = c("a", "b", "c", "d", "a", "b", "c", "d"
), Value = c(0.8167347, 0.1416642, 0.7054814, 0.1196948, 0.36045,
0.2573068, 0.700304, 0.8131556)), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))