I have a data of this form :
ID FIRM PAY_START PAY_end
1 A 1 359
1 A 360 500
1 B 720 780
1 C 800 930
1 D 934 1200
2 E 1 316
2 E 320 360
3 F 1 339
3 F 340 342
3 G 345 600
I want First to convert PAY_START( which are in Days) into occurences in the Panel 1, 2, 3 like this:
ID FIRM PAY_START PAY_end Occurence
1 A 1 359 1
1 A 360 500 2
1 B 720 780 3
1 C 800 930 4
1 D 934 1200 5
2 E 1 316 1
2 E 320 360 2
3 F 1 339 1
3 F 340 342 2
3 G 345 600 3
And I want to compute the lags of Pay_end:
ID FIRM PAY_START PAY_end Occurence Lag_Pay_end
1 A 1 359 1 0
1 A 360 500 2 141
1 B 720 780 3 280
1 C 800 930 4 150
1 D 934 1200 5 270
2 E 1 316 1 0
2 E 320 360 2 44
3 F 1 339 1 0
3 F 340 342 2 3
3 G 345 600 3 258
Thank you for your help!
CodePudding user response:
a data.table
approach
library(data.table)
DT <- fread("ID FIRM PAY_START PAY_end
1 A 1 359
1 A 360 500
1 B 720 780
1 C 800 930
1 D 934 1200
2 E 1 316
2 E 320 360
3 F 1 339
3 F 340 342
3 G 345 600")
DT[, Occurence := rowid(ID)]
DT[, Lag_Pay_end := PAY_end - shift(PAY_end, type = "lag", fill = PAY_end[1]),
by = .(ID)]
# ID FIRM PAY_START PAY_end Occurence Lag_Pay_end
# 1: 1 A 1 359 1 0
# 2: 1 A 360 500 2 141
# 3: 1 B 720 780 3 280
# 4: 1 C 800 930 4 150
# 5: 1 D 934 1200 5 270
# 6: 2 E 1 316 1 0
# 7: 2 E 320 360 2 44
# 8: 3 F 1 339 1 0
# 9: 3 F 340 342 2 3
#10: 3 G 345 600 3 258
CodePudding user response:
Performing group-wise calculations using ave
.
transform(dat, Occurence=ave(PAY_START, ID, FUN=\(x) as.numeric(as.factor(x))),
Lag_Pay_end=ave(PAY_end, ID, FUN=\(x) c(0, diff(x))))
# ID FIRM PAY_START PAY_end Occurence Lag_Pay_end
# 1 1 A 1 359 1 0
# 2 1 A 360 500 2 141
# 3 1 B 720 780 3 280
# 4 1 C 800 930 4 150
# 5 1 D 934 1200 5 270
# 6 2 E 1 316 1 0
# 7 2 E 320 360 2 44
# 8 3 F 1 339 1 0
# 9 3 F 340 342 2 3
# 10 3 G 345 600 3 258
Data:
dat <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L),
FIRM = c("A", "A", "B", "C", "D", "E", "E", "F", "F", "G"
), PAY_START = c(1L, 360L, 720L, 800L, 934L, 1L, 320L, 1L,
340L, 345L), PAY_end = c(359L, 500L, 780L, 930L, 1200L, 316L,
360L, 339L, 342L, 600L)), class = "data.frame", row.names = c(NA,
-10L))