A grouped data frame:
grp_diamonds <- diamonds %>%
group_by(cut, color) %>%
mutate(rn = row_number()) %>%
arrange(cut, color, rn) %>%
mutate(cumprice = cumsum(price))
Looks like:
grp_diamonds
# A tibble: 53,940 × 12
# Groups: cut, color [35]
carat cut color clarity depth table price x y z rn cumprice
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int> <int>
1 0.75 Fair D SI2 64.6 57 2848 5.74 5.72 3.7 1 2848
2 0.71 Fair D VS2 56.9 65 2858 5.89 5.84 3.34 2 5706
3 0.9 Fair D SI2 66.9 57 2885 6.02 5.9 3.99 3 8591
4 1 Fair D SI2 69.3 58 2974 5.96 5.87 4.1 4 11565
5 1.01 Fair D SI2 64.6 56 3003 6.31 6.24 4.05 5 14568
6 0.73 Fair D VS1 66 54 3047 5.56 5.66 3.7 6 17615
7 0.71 Fair D VS2 64.7 58 3077 5.61 5.58 3.62 7 20692
8 0.91 Fair D SI2 62.5 66 3079 6.08 6.01 3.78 8 23771
9 0.9 Fair D SI2 65.9 59 3205 6 5.95 3.94 9 26976
10 0.9 Fair D SI2 66 58 3205 6 5.97 3.95 10 30181
Within each group, I would like to add a new field 'GROWTH_6_7' which is the delta between cumprice at rn = 7 - rn = 6.
Read documentation and tried and failed using cur_data()
with mutate. Maybe that's the right path or maybe there's a 'better' way?
How can I mutate a new field within each group 'GROWTH_6_7' that is the difference between rn == 7 and rn ==6 cumprice?
CodePudding user response:
GrowDelta <- function(data, start_row = 6, end_row = 7){
data$cumprice[end_row] - data$cumprice[start_row]
}
grp_diamonds %>%
summarize(GROWTH_6_7 = GrowDelta(cur_data()))
mutate
instead of summarize
should work, too. It will just repeat it for every row in the group instead of just once for each group, i.e. will result in tibble with the same number of rows as the data set. Using summarize
will give you a 35 x 3 tibble.
CodePudding user response:
You may try group_modify
:
Code
grow <- grp_diamonds %>%
group_by(cut, color) %>%
group_modify(~{
.x %>%
mutate(GROWTH_6_7 = .x$cumprice[.x$rn == 7] - .x$cumprice[.x$rn == 6])
})
Output
> head(grow)
# A tibble: 6 x 13
# Groups: cut, color [1]
cut color carat clarity depth table price x y z rn cumprice GROWTH_6_7
<ord> <ord> <dbl> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int>
1 Fair D 0.75 SI2 64.6 57 2848 5.74 5.72 3.7 1 2848 3077
2 Fair D 0.71 VS2 56.9 65 2858 5.89 5.84 3.34 2 5706 3077
3 Fair D 0.9 SI2 66.9 57 2885 6.02 5.9 3.99 3 8591 3077
4 Fair D 1 SI2 69.3 58 2974 5.96 5.87 4.1 4 11565 3077
5 Fair D 1.01 SI2 64.6 56 3003 6.31 6.24 4.05 5 14568 3077
6 Fair D 0.73 VS1 66 54 3047 5.56 5.66 3.7 6 17615 3077
CodePudding user response:
We could do this within mutate
itself`
library(dplyr)
grp_diamonds %>%
group_by(cut, color) %>%
mutate(GROWTH_6_7 = cumprice[rn == 7] - cumprice[rn == 6])
-output
# A tibble: 53,940 x 13
# Groups: cut, color [35]
carat cut color clarity depth table price x y z rn cumprice GROWTH_6_7
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int>
1 0.75 Fair D SI2 64.6 57 2848 5.74 5.72 3.7 1 2848 3077
2 0.71 Fair D VS2 56.9 65 2858 5.89 5.84 3.34 2 5706 3077
3 0.9 Fair D SI2 66.9 57 2885 6.02 5.9 3.99 3 8591 3077
4 1 Fair D SI2 69.3 58 2974 5.96 5.87 4.1 4 11565 3077
5 1.01 Fair D SI2 64.6 56 3003 6.31 6.24 4.05 5 14568 3077
6 0.73 Fair D VS1 66 54 3047 5.56 5.66 3.7 6 17615 3077
7 0.71 Fair D VS2 64.7 58 3077 5.61 5.58 3.62 7 20692 3077
8 0.91 Fair D SI2 62.5 66 3079 6.08 6.01 3.78 8 23771 3077
9 0.9 Fair D SI2 65.9 59 3205 6 5.95 3.94 9 26976 3077
10 0.9 Fair D SI2 66 58 3205 6 5.97 3.95 10 30181 3077
# … with 53,930 more rows
If there are cases where there are some missing values, then another option is pivot_wider
library(tidyr)
grp_diamonds %>%
ungroup %>%
select(cut, color, rn, cumprice) %>%
filter(rn %in% 6:7) %>%
pivot_wider(names_from = rn, values_from = cumprice) %>%
transmute(cut, color, GROWTH_6_7 = `7` - `6`) %>%
left_join(grp_diamonds, .)