I am trying to group the variables group, type, and year. Each group, type, and year have a particular code that changes from year to year. I want to create a column called "difference" where, if the group and type has a code of 200 in one year and 210 the next year, the "difference" column will register it as an increase in 10.
group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
150, 150, 100, 200, 200, 200, 350, 320, 300,
130, 170, 90, 210, 90, 80, 310, 300, 320)
df <- data.frame(group, type, year, code)
This is how the df looks like:
group type year code
1 A small 1995 100
2 A medium 1995 100
3 A large 1995 100
4 B small 1995 200
5 B medium 1995 200
6 B large 1995 200
7 C small 1995 300
8 C medium 1995 300
9 C large 1995 300
10 A small 1996 150
11 A medium 1996 150
12 A large 1996 100
13 B small 1996 200
14 B medium 1996 200
15 B large 1996 200
16 C small 1996 350
17 C medium 1996 320
18 C large 1996 300
19 A small 1997 130
20 A medium 1997 170
21 A large 1997 90
22 B small 1997 210
23 B medium 1997 90
24 B large 1997 80
25 C small 1997 310
26 C medium 1997 300
27 C large 1997 320
I want the following output:
group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
150, 150, 100, 200, 200, 200, 350, 320, 300,
130, 170, 90, 210, 90, 80, 310, 300, 320)
difference <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
50, 50, 0, 0, 0, 0, 50, 20, 0,
-20, 20, -10, 10, 110, 120, -40, -20, 0)
df2 <- data.frame(group, type, year, code, difference)
group type year code difference
1 A small 1995 100 NA
2 A medium 1995 100 NA
3 A large 1995 100 NA
4 B small 1995 200 NA
5 B medium 1995 200 NA
6 B large 1995 200 NA
7 C small 1995 300 NA
8 C medium 1995 300 NA
9 C large 1995 300 NA
10 A small 1996 150 50
11 A medium 1996 150 50
12 A large 1996 100 0
13 B small 1996 200 0
14 B medium 1996 200 0
15 B large 1996 200 0
16 C small 1996 350 50
17 C medium 1996 320 20
18 C large 1996 300 0
19 A small 1997 130 -20
20 A medium 1997 170 20
21 A large 1997 90 -10
22 B small 1997 210 10
23 B medium 1997 90 110
24 B large 1997 80 120
25 C small 1997 310 -40
26 C medium 1997 300 -20
27 C large 1997 320 0
This is what I tried:
df3 <- df2 %>%
group_by(group, type, year) %>%
mutate(difference = code - lag(code))
The problem is that the lag seems to not take the grouping into consideration and is instead just subtracting from the row right before it. Any suggestions?
CodePudding user response:
As @IRTFM already points out. group only by group
and type
.
It gives almost the same output. Note the last row is different.
library(dplyr)
df %>%
group_by(group, type) %>%
mutate(difference= code - lag(code)) %>%
data.frame()
group type year code difference
1 A small 1995 100 NA
2 A medium 1995 100 NA
3 A large 1995 100 NA
4 B small 1995 200 NA
5 B medium 1995 200 NA
6 B large 1995 200 NA
7 C small 1995 300 NA
8 C medium 1995 300 NA
9 C large 1995 300 NA
10 A small 1996 150 50
11 A medium 1996 150 50
12 A large 1996 100 0
13 B small 1996 200 0
14 B medium 1996 200 0
15 B large 1996 200 0
16 C small 1996 350 50
17 C medium 1996 320 20
18 C large 1996 300 0
19 A small 1997 130 -20
20 A medium 1997 170 20
21 A large 1997 90 -10
22 B small 1997 210 10
23 B medium 1997 90 -110
24 B large 1997 80 -120
25 C small 1997 310 -40
26 C medium 1997 300 -20
27 C large 1997 320 20
CodePudding user response:
You could use diff
in ave
.
dat[order(dat$group, dat$type), ] |>
transform(diff=ave(code, group, type, FUN=\(x) c(NA, diff(x)))) |>
(\(x) x[order(as.numeric(rownames(x))), ])() ## optional, to reorder rows
# group type year code diff
# 1 A small 1995 100 NA
# 2 A medium 1995 100 NA
# 3 A large 1995 100 NA
# 4 B small 1995 200 NA
# 5 B medium 1995 200 NA
# 6 B large 1995 200 NA
# 7 C small 1995 300 NA
# 8 C medium 1995 300 NA
# 9 C large 1995 300 NA
# 10 A small 1996 150 50
# 11 A medium 1996 150 50
# 12 A large 1996 100 0
# 13 B small 1996 200 0
# 14 B medium 1996 200 0
# 15 B large 1996 200 0
# 16 C small 1996 350 50
# 17 C medium 1996 320 20
# 18 C large 1996 300 0
# 19 A small 1997 130 -20
# 20 A medium 1997 170 20
# 21 A large 1997 90 -10
# 22 B small 1997 210 10
# 23 B medium 1997 90 -110
# 24 B large 1997 80 -120
# 25 C small 1997 310 -40
# 26 C medium 1997 300 -20
# 27 C large 1997 320 20
Note: R >= 4.1 used