I am attempting to calculate a percent change column based on a conditional mean in another column, all while ensuring that I am grouping by the appropriate ID.
The sample dataset is as follows:
library(data.table)
dat1 <- data.table(id = sample.int(n = 10, size = 20, replace = T),
income = sample.int(n = 5000, size = 20, replace = T),
condition = c(0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0),
x = runif(20, 50, 100)
)
> dat1
id income condition x
1: 2 4144 0 68.81022
2: 6 2775 1 87.00959
3: 6 4821 1 81.38100
4: 2 2468 1 66.68466
5: 1 3852 1 50.46425
6: 6 1977 0 75.05786
7: 10 1570 0 99.89615
8: 6 3559 0 76.53732
9: 10 1828 0 69.28823
10: 4 2807 1 74.27919
11: 9 2426 1 64.66952
12: 3 3100 0 59.43129
13: 2 4068 0 90.92693
14: 9 4536 0 73.74375
15: 9 427 0 91.74090
16: 5 2151 1 58.83286
17: 1 3441 1 52.48493
18: 10 1553 0 75.16166
19: 5 2223 1 65.56480
20: 1 2890 0 87.76247
First I added a column for the income based on the condition (I used na.rm due to the actual dataset containing NAs).
dat1[, conditionIncome := fcase(condition == 0, mean(income, na.rm = T), condition == 1, mean(income, na.rm = T)), by = .(id, condition)]
> dat1
id income condition x conditionIncome
1: 2 4144 0 68.81022 4106.000
2: 6 2775 1 87.00959 3798.000
3: 6 4821 1 81.38100 3798.000
4: 2 2468 1 66.68466 2468.000
5: 1 3852 1 50.46425 3646.500
6: 6 1977 0 75.05786 2768.000
7: 10 1570 0 99.89615 1650.333
8: 6 3559 0 76.53732 2768.000
9: 10 1828 0 69.28823 1650.333
10: 4 2807 1 74.27919 2807.000
11: 9 2426 1 64.66952 2426.000
12: 3 3100 0 59.43129 3100.000
13: 2 4068 0 90.92693 4106.000
14: 9 4536 0 73.74375 2481.500
15: 9 427 0 91.74090 2481.500
16: 5 2151 1 58.83286 2187.000
17: 1 3441 1 52.48493 3646.500
18: 10 1553 0 75.16166 1650.333
19: 5 2223 1 65.56480 2187.000
20: 1 2890 0 87.76247 2890.000
Next, I attempting to create a percent change column to calculate the income change during one condition compared to the income change during the other condition. I tried something along the lines of this:
dat1[, incomeChange := .(.I[condition == 1, conditionIncome]/.I[condition == 0, conditionIncome]*100), by = .(id, condition)]
The above code does not work, but this was my thought process thus far. Any ideas?
CodePudding user response:
We may need
dat1[, conditionIncome := mean(income, na.rm = TRUE), .(id, condition)]
dat1[, incomeChange := first(conditionIncome[
condition == 1])/first(conditionIncome[condition == 0]), id]