Home > Enterprise >  Calculate percent change based on conditional mean in another column per id in R
Calculate percent change based on conditional mean in another column per id in R

Time:07-28

I am attempting to calculate a percent change column based on a conditional mean in another column, all while ensuring that I am grouping by the appropriate ID.

The sample dataset is as follows:

library(data.table)
dat1 <- data.table(id = sample.int(n = 10, size = 20, replace = T), 
                   income = sample.int(n = 5000, size = 20, replace = T),
                   condition = c(0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0),
                   x = runif(20, 50, 100)
                   )
> dat1
    id income condition        x
 1:  2   4144         0 68.81022
 2:  6   2775         1 87.00959
 3:  6   4821         1 81.38100
 4:  2   2468         1 66.68466
 5:  1   3852         1 50.46425
 6:  6   1977         0 75.05786
 7: 10   1570         0 99.89615
 8:  6   3559         0 76.53732
 9: 10   1828         0 69.28823
10:  4   2807         1 74.27919
11:  9   2426         1 64.66952
12:  3   3100         0 59.43129
13:  2   4068         0 90.92693
14:  9   4536         0 73.74375
15:  9    427         0 91.74090
16:  5   2151         1 58.83286
17:  1   3441         1 52.48493
18: 10   1553         0 75.16166
19:  5   2223         1 65.56480
20:  1   2890         0 87.76247

First I added a column for the income based on the condition (I used na.rm due to the actual dataset containing NAs).

dat1[, conditionIncome := fcase(condition == 0, mean(income, na.rm = T), condition == 1, mean(income, na.rm = T)), by = .(id, condition)]
> dat1
    id income condition        x conditionIncome
 1:  2   4144         0 68.81022       4106.000
 2:  6   2775         1 87.00959       3798.000
 3:  6   4821         1 81.38100       3798.000
 4:  2   2468         1 66.68466       2468.000
 5:  1   3852         1 50.46425       3646.500
 6:  6   1977         0 75.05786       2768.000
 7: 10   1570         0 99.89615       1650.333
 8:  6   3559         0 76.53732       2768.000
 9: 10   1828         0 69.28823       1650.333
10:  4   2807         1 74.27919       2807.000
11:  9   2426         1 64.66952       2426.000
12:  3   3100         0 59.43129       3100.000
13:  2   4068         0 90.92693       4106.000
14:  9   4536         0 73.74375       2481.500
15:  9    427         0 91.74090       2481.500
16:  5   2151         1 58.83286       2187.000
17:  1   3441         1 52.48493       3646.500
18: 10   1553         0 75.16166       1650.333
19:  5   2223         1 65.56480       2187.000
20:  1   2890         0 87.76247       2890.000

Next, I attempting to create a percent change column to calculate the income change during one condition compared to the income change during the other condition. I tried something along the lines of this:

dat1[, incomeChange := .(.I[condition == 1, conditionIncome]/.I[condition == 0, conditionIncome]*100), by = .(id, condition)]

The above code does not work, but this was my thought process thus far. Any ideas?

CodePudding user response:

We may need

dat1[,  conditionIncome := mean(income, na.rm = TRUE), .(id, condition)]
dat1[, incomeChange := first(conditionIncome[
     condition == 1])/first(conditionIncome[condition == 0]), id]
  • Related