I have the following data frame:
df <- data.frame(
ID = c(11041,11041,11041,11041,11041,11041,11042,11042,11042,11063,11063),
p = c(2.9,3.6,4.8,2.6,2.2,3.9,6.5,2.9,1.4,0.7,5.1)
)
Which gives this output:
ID p
1 11041 2.9
2 11041 3.6
3 11041 4.8
4 11041 2.6
5 11041 2.2
6 11041 3.9
7 11042 6.5
8 11042 2.9
9 11042 1.4
10 11063 0.7
11 11063 5.1
I am trying to create new columns for the last n values (let's say 3) by ID
, so my new data frame would look like so:
ID p p1.1 p1.2 p1.3
1 11041 2.9 NA NA 2.9
2 11041 3.6 NA 2.9 3.6
3 11041 4.8 2.9 3.6 4.8
4 11041 2.6 3.6 4.8 2.6
5 11041 2.2 4.8 2.6 2.2
6 11041 3.9 2.6 2.2 3.9
7 11042 6.5 NA NA 6.5
8 11042 2.9 NA 6.5 2.9
9 11042 1.4 6.5 2.9 1.4
10 11063 0.7 NA NA 0.7
11 11063 5.1 NA 0.7 5.1
Ideally, I would like to be able to specify the last n values, so that if I want the last 5 (creating p.1, p.2, p.3, p.4, p.5) I can do this. But n could be any number, like 3,5,10,15, etc.
With dplyr
, I have tried both spread
and pivot_wider
but could not get them to work.
CodePudding user response:
You can use data.table shift
like this:
library(data.table)
setDT(df)[, c(paste0("p1.",1:3)):=shift(p, 2:0), by=ID][]
Output:
ID p p1.1 p1.2 p1.3
1: 11041 2.9 NA NA 2.9
2: 11041 3.6 NA 2.9 3.6
3: 11041 4.8 2.9 3.6 4.8
4: 11041 2.6 3.6 4.8 2.6
5: 11041 2.2 4.8 2.6 2.2
6: 11041 3.9 2.6 2.2 3.9
7: 11042 6.5 NA NA 6.5
8: 11042 2.9 NA 6.5 2.9
9: 11042 1.4 6.5 2.9 1.4
10: 11063 0.7 NA NA 0.7
11: 11063 5.1 NA 0.7 5.1
One way to generalize would be as follows:
- Make helper function
f <- function(v,n) {
setNames(shift(v,(n-1):0), paste0(deparse(substitute(v)),".",1:n))
}
- Apply function by ID; here I use
n=5
df[, f(p,5), by=ID]
Output:
ID p.1 p.2 p.3 p.4 p.5
1: 11041 NA NA NA NA 2.9
2: 11041 NA NA NA 2.9 3.6
3: 11041 NA NA 2.9 3.6 4.8
4: 11041 NA 2.9 3.6 4.8 2.6
5: 11041 2.9 3.6 4.8 2.6 2.2
6: 11041 3.6 4.8 2.6 2.2 3.9
7: 11042 NA NA NA NA 6.5
8: 11042 NA NA NA 6.5 2.9
9: 11042 NA NA 6.5 2.9 1.4
10: 11063 NA NA NA NA 0.7
11: 11063 NA NA NA 0.7 5.1