Home > Blockchain >  How to find the change in value between subsequent groups from an initial group in R?
How to find the change in value between subsequent groups from an initial group in R?

Time:09-22

I want to find the change of the height at each id for each event (B,C,D) from the initial event (A). For example, the change of A1 to B1 is 5 or A1 to C1 is 11.

df <- data.frame(event=c(rep('A',4),rep('B',4),rep('C',4),rep('D',4)), id=rep(1:4,4),height = c(1,4,2,4,6,8,9,6,12,15,13,16,21,24,26,24))
    event id height
1      A  1      1
2      A  2      4
3      A  3      2
4      A  4      4
5      B  1      6
6      B  2      8
7      B  3      9
8      B  4      6
9      C  1     12
10     C  2     15
11     C  3     13
12     C  4     16
13     D  1     21
14     D  2     24
15     D  3     26
16     D  4     24

To try to answer this I made a separate dataframe of just initial event points and then subtracted them from each event group.

df_a <- filter(df,event =="A") # filtered only initial event points

df %>%
  group_by(event) %>% # grouped events
  mutate(growth = height - df_a$height) # subtracted grouped event points by the initial event 


   event    id height growth
 1 A         1      1      0
 2 A         2      4      0
 3 A         3      2      0
 4 A         4      4      0
 5 B         1      6      5
 6 B         2      8      4
 7 B         3      9      7
 8 B         4      6      2
 9 C         1     12     11
10 C         2     15     11
11 C         3     13     11
12 C         4     16     12
13 D         1     21     20
14 D         2     24     20
15 D         3     26     24
16 D         4     24     20

But I am worried about duplicate, missing, or mistyped events or IDs that might throw off this vectored approach. I need suggestions on how to confirm that I will find the correct change for each respective ID. (A1 to B1, C1, or D1) and (A2 to B2, C2, D2) and so on.

Can I do this with looping? I am not very confident with looping but could looping accomplish this by confirming that only matching IDs can be used?

Or how can I confirm that every initial ID has a respective ID(or NA) at each event so the vectored approach will work 100% of the time.

Thank you.

CodePudding user response:

Do a grouping by 'id' and then subtract from subsetting the 'height' where 'event' is 'A'. Just to avoid for any missing values use in% instead of == (as %in% returns FALSE for NA whereas == returns NA) and also, after the subset, get the first element ([1]) in case there are duplicates

library(dplyr)
df %>%
    group_by(id) %>% 
    mutate(growth =  height - height[event %in% 'A'][1]) %>%
    ungroup

-output

# A tibble: 16 x 4
   event    id height growth
   <chr> <int>  <int>  <int>
 1 A         1      1      0
 2 A         2      4      0
 3 A         3      2      0
 4 A         4      4      0
 5 B         1      6      5
 6 B         2      8      4
 7 B         3      9      7
 8 B         4      6      2
 9 C         1     12     11
10 C         2     15     11
11 C         3     13     11
12 C         4     16     12
13 D         1     21     20
14 D         2     24     20
15 D         3     26     24
16 D         4     24     20

We can also test where there are no 'A's in a 'id'

> df %>% 
      slice(-1) %>%  # removed the first row
      group_by(id) %>% 
      mutate(growth =  height - height[event %in% 'A'][1]) %>%
      ungroup
# A tibble: 15 x 4
   event    id height growth
   <chr> <int>  <int>  <int>
 1 A         2      4      0
 2 A         3      2      0
 3 A         4      4      0
 4 B         1      6     NA # note the NAs for those cases
 5 B         2      8      4
 6 B         3      9      7
 7 B         4      6      2
 8 C         1     12     NA
 9 C         2     15     11
10 C         3     13     11
11 C         4     16     12
12 D         1     21     NA
13 D         2     24     20
14 D         3     26     24
15 D         4     24     20

data

df <- structure(list(event = c("A", "A", "A", "A", "B", "B", "B", "B", 
"C", "C", "C", "C", "D", "D", "D", "D"), id = c(1L, 2L, 3L, 4L, 
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), height = c(1L, 
4L, 2L, 4L, 6L, 8L, 9L, 6L, 12L, 15L, 13L, 16L, 21L, 24L, 26L, 
24L)), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16"))
  •  Tags:  
  • r
  • Related