I want to find the change of the height at each id for each event (B,C,D) from the initial event (A). For example, the change of A1 to B1 is 5 or A1 to C1 is 11.
df <- data.frame(event=c(rep('A',4),rep('B',4),rep('C',4),rep('D',4)), id=rep(1:4,4),height = c(1,4,2,4,6,8,9,6,12,15,13,16,21,24,26,24))
event id height
1 A 1 1
2 A 2 4
3 A 3 2
4 A 4 4
5 B 1 6
6 B 2 8
7 B 3 9
8 B 4 6
9 C 1 12
10 C 2 15
11 C 3 13
12 C 4 16
13 D 1 21
14 D 2 24
15 D 3 26
16 D 4 24
To try to answer this I made a separate dataframe of just initial event points and then subtracted them from each event group.
df_a <- filter(df,event =="A") # filtered only initial event points
df %>%
group_by(event) %>% # grouped events
mutate(growth = height - df_a$height) # subtracted grouped event points by the initial event
event id height growth
1 A 1 1 0
2 A 2 4 0
3 A 3 2 0
4 A 4 4 0
5 B 1 6 5
6 B 2 8 4
7 B 3 9 7
8 B 4 6 2
9 C 1 12 11
10 C 2 15 11
11 C 3 13 11
12 C 4 16 12
13 D 1 21 20
14 D 2 24 20
15 D 3 26 24
16 D 4 24 20
But I am worried about duplicate, missing, or mistyped events or IDs that might throw off this vectored approach. I need suggestions on how to confirm that I will find the correct change for each respective ID. (A1 to B1, C1, or D1) and (A2 to B2, C2, D2) and so on.
Can I do this with looping? I am not very confident with looping but could looping accomplish this by confirming that only matching IDs can be used?
Or how can I confirm that every initial ID has a respective ID(or NA) at each event so the vectored approach will work 100% of the time.
Thank you.
CodePudding user response:
Do a grouping by 'id' and then subtract from subsetting the 'height' where 'event' is 'A'. Just to avoid for any missing values use in%
instead of ==
(as %in%
returns FALSE for NA whereas ==
returns NA) and also, after the subset, get the first element ([1]
) in case there are duplicates
library(dplyr)
df %>%
group_by(id) %>%
mutate(growth = height - height[event %in% 'A'][1]) %>%
ungroup
-output
# A tibble: 16 x 4
event id height growth
<chr> <int> <int> <int>
1 A 1 1 0
2 A 2 4 0
3 A 3 2 0
4 A 4 4 0
5 B 1 6 5
6 B 2 8 4
7 B 3 9 7
8 B 4 6 2
9 C 1 12 11
10 C 2 15 11
11 C 3 13 11
12 C 4 16 12
13 D 1 21 20
14 D 2 24 20
15 D 3 26 24
16 D 4 24 20
We can also test where there are no 'A's in a 'id'
> df %>%
slice(-1) %>% # removed the first row
group_by(id) %>%
mutate(growth = height - height[event %in% 'A'][1]) %>%
ungroup
# A tibble: 15 x 4
event id height growth
<chr> <int> <int> <int>
1 A 2 4 0
2 A 3 2 0
3 A 4 4 0
4 B 1 6 NA # note the NAs for those cases
5 B 2 8 4
6 B 3 9 7
7 B 4 6 2
8 C 1 12 NA
9 C 2 15 11
10 C 3 13 11
11 C 4 16 12
12 D 1 21 NA
13 D 2 24 20
14 D 3 26 24
15 D 4 24 20
data
df <- structure(list(event = c("A", "A", "A", "A", "B", "B", "B", "B",
"C", "C", "C", "C", "D", "D", "D", "D"), id = c(1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), height = c(1L,
4L, 2L, 4L, 6L, 8L, 9L, 6L, 12L, 15L, 13L, 16L, 21L, 24L, 26L,
24L)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16"))