Home > Software design >  How to group variables then subtract by rows in R?
How to group variables then subtract by rows in R?

Time:12-28

I am trying to group the variables group, type, and year. Each group, type, and year have a particular code that changes from year to year. I want to create a column called "difference" where, if the group and type has a code of 200 in one year and 210 the next year, the "difference" column will register it as an increase in 10.

group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
          1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
          1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
          150, 150, 100, 200, 200, 200, 350, 320, 300,
          130, 170, 90, 210, 90, 80, 310, 300, 320)

df <- data.frame(group, type, year, code)

This is how the df looks like:

 group   type year code
1      A  small 1995  100
2      A medium 1995  100
3      A  large 1995  100
4      B  small 1995  200
5      B medium 1995  200
6      B  large 1995  200
7      C  small 1995  300
8      C medium 1995  300
9      C  large 1995  300
10     A  small 1996  150
11     A medium 1996  150
12     A  large 1996  100
13     B  small 1996  200
14     B medium 1996  200
15     B  large 1996  200
16     C  small 1996  350
17     C medium 1996  320
18     C  large 1996  300
19     A  small 1997  130
20     A medium 1997  170
21     A  large 1997   90
22     B  small 1997  210
23     B medium 1997   90
24     B  large 1997   80
25     C  small 1997  310
26     C medium 1997  300
27     C  large 1997  320

I want the following output:

group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
          1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
          1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
          150, 150, 100, 200, 200, 200, 350, 320, 300,
          130, 170, 90, 210, 90, 80, 310, 300, 320)
difference <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
               50, 50, 0, 0, 0, 0, 50, 20, 0,
               -20, 20, -10, 10, 110, 120, -40, -20, 0)

df2 <- data.frame(group, type, year, code, difference)

   group   type year code difference
1      A  small 1995  100         NA
2      A medium 1995  100         NA
3      A  large 1995  100         NA
4      B  small 1995  200         NA
5      B medium 1995  200         NA
6      B  large 1995  200         NA
7      C  small 1995  300         NA
8      C medium 1995  300         NA
9      C  large 1995  300         NA
10     A  small 1996  150         50
11     A medium 1996  150         50
12     A  large 1996  100          0
13     B  small 1996  200          0
14     B medium 1996  200          0
15     B  large 1996  200          0
16     C  small 1996  350         50
17     C medium 1996  320         20
18     C  large 1996  300          0
19     A  small 1997  130        -20
20     A medium 1997  170         20
21     A  large 1997   90        -10
22     B  small 1997  210         10
23     B medium 1997   90        110
24     B  large 1997   80        120
25     C  small 1997  310        -40
26     C medium 1997  300        -20
27     C  large 1997  320          0

This is what I tried:

df3 <- df2 %>%
  group_by(group, type, year) %>%
  mutate(difference = code - lag(code))

The problem is that the lag seems to not take the grouping into consideration and is instead just subtracting from the row right before it. Any suggestions?

CodePudding user response:

As @IRTFM already points out. group only by group and type. It gives almost the same output. Note the last row is different.

library(dplyr)

df %>% 
  group_by(group, type) %>% 
  mutate(difference= code - lag(code)) %>% 
  data.frame()
 group   type year code difference
1      A  small 1995  100         NA
2      A medium 1995  100         NA
3      A  large 1995  100         NA
4      B  small 1995  200         NA
5      B medium 1995  200         NA
6      B  large 1995  200         NA
7      C  small 1995  300         NA
8      C medium 1995  300         NA
9      C  large 1995  300         NA
10     A  small 1996  150         50
11     A medium 1996  150         50
12     A  large 1996  100          0
13     B  small 1996  200          0
14     B medium 1996  200          0
15     B  large 1996  200          0
16     C  small 1996  350         50
17     C medium 1996  320         20
18     C  large 1996  300          0
19     A  small 1997  130        -20
20     A medium 1997  170         20
21     A  large 1997   90        -10
22     B  small 1997  210         10
23     B medium 1997   90       -110
24     B  large 1997   80       -120
25     C  small 1997  310        -40
26     C medium 1997  300        -20
27     C  large 1997  320         20

CodePudding user response:

You could use diff in ave.

dat[order(dat$group, dat$type), ] |>
  transform(diff=ave(code, group, type, FUN=\(x) c(NA, diff(x)))) |>
  (\(x) x[order(as.numeric(rownames(x))), ])()  ## optional, to reorder rows
#    group   type year code diff
# 1      A  small 1995  100   NA
# 2      A medium 1995  100   NA
# 3      A  large 1995  100   NA
# 4      B  small 1995  200   NA
# 5      B medium 1995  200   NA
# 6      B  large 1995  200   NA
# 7      C  small 1995  300   NA
# 8      C medium 1995  300   NA
# 9      C  large 1995  300   NA
# 10     A  small 1996  150   50
# 11     A medium 1996  150   50
# 12     A  large 1996  100    0
# 13     B  small 1996  200    0
# 14     B medium 1996  200    0
# 15     B  large 1996  200    0
# 16     C  small 1996  350   50
# 17     C medium 1996  320   20
# 18     C  large 1996  300    0
# 19     A  small 1997  130  -20
# 20     A medium 1997  170   20
# 21     A  large 1997   90  -10
# 22     B  small 1997  210   10
# 23     B medium 1997   90 -110
# 24     B  large 1997   80 -120
# 25     C  small 1997  310  -40
# 26     C medium 1997  300  -20
# 27     C  large 1997  320   20

Note: R >= 4.1 used

  • Related