Home > front end >  Difference by subgroup using R
Difference by subgroup using R

Time:10-19

I have the following dataset:

enter image description here

I want to calculate the difference between values according to the subgroups. Nevertheless, subgroup 1 must come first. Thus 10-0=10; 0-20=-20; 30-31=-1. I want to perform it using R.

enter image description here

I know that it would be something like this, but I do not know how to put the sub_group into the code:

library(tidyverse)

    df %>% 
        group_by(group) %>% 
        summarise(difference= diff(value))

CodePudding user response:

Edited answer after OP's comment:

The OP clarified that the data are not sorted by sub_group within every group. Therefore, I added the arrange after group_by. The OP further clarified that the value of sub_group == 1 always should be the first term of the difference.

Below I demonstrate how to achieve this in an example with 3 sub_groups within every group. The code rests on the assumption that the lowest value of sub_group == 1. I drop each group's first sub_group after the difference.

library(tidyverse)

df <- tibble(group = rep(LETTERS[1:3], each = 3),
             sub_group = rep(1:3, 3),
             value = c(10,0,5,0,20,15,30,31,10)) 

df
#> # A tibble: 9 × 3
#>   group sub_group value
#>   <chr>     <int> <dbl>
#> 1 A             1    10
#> 2 A             2     0
#> 3 A             3     5
#> 4 B             1     0
#> 5 B             2    20
#> 6 B             3    15
#> 7 C             1    30
#> 8 C             2    31
#> 9 C             3    10

df  |>  
  group_by(group) |> 
  arrange(group, sub_group) |> 
  mutate(value = first(value) - value) |> 
  slice(2:n())
#> # A tibble: 6 × 3
#> # Groups:   group [3]
#>   group sub_group value
#>   <chr>     <int> <dbl>
#> 1 A             2    10
#> 2 A             3     5
#> 3 B             2   -20
#> 4 B             3   -15
#> 5 C             2    -1
#> 6 C             3    20

Created on 2022-10-18 with reprex v2.0.2

P.S. (from the original answer) In the example data, you show the wrong difference for group C. It should read -1. I am convinced that most people here would appreciate if you could post your example data using code or at least as text which can be copied instead of a picture.

  • Related