Home > Net >  Create new columns for last n rows in dataframe by group dplyr
Create new columns for last n rows in dataframe by group dplyr

Time:05-20

I have the following data frame:

df <- data.frame(
  ID  =  c(11041,11041,11041,11041,11041,11041,11042,11042,11042,11063,11063),
  p = c(2.9,3.6,4.8,2.6,2.2,3.9,6.5,2.9,1.4,0.7,5.1)
    )

Which gives this output:

      ID   p
1  11041 2.9
2  11041 3.6
3  11041 4.8
4  11041 2.6
5  11041 2.2
6  11041 3.9
7  11042 6.5
8  11042 2.9
9  11042 1.4
10 11063 0.7
11 11063 5.1

I am trying to create new columns for the last n values (let's say 3) by ID, so my new data frame would look like so:

      ID   p p1.1 p1.2 p1.3
1  11041 2.9   NA   NA  2.9
2  11041 3.6   NA  2.9  3.6
3  11041 4.8  2.9  3.6  4.8
4  11041 2.6  3.6  4.8  2.6
5  11041 2.2  4.8  2.6  2.2
6  11041 3.9  2.6  2.2  3.9
7  11042 6.5   NA   NA  6.5
8  11042 2.9   NA  6.5  2.9
9  11042 1.4  6.5  2.9  1.4
10 11063 0.7   NA   NA  0.7
11 11063 5.1   NA  0.7  5.1

Ideally, I would like to be able to specify the last n values, so that if I want the last 5 (creating p.1, p.2, p.3, p.4, p.5) I can do this. But n could be any number, like 3,5,10,15, etc.

With dplyr, I have tried both spread and pivot_wider but could not get them to work.

CodePudding user response:

You can use data.table shift like this:

library(data.table)
setDT(df)[, c(paste0("p1.",1:3)):=shift(p, 2:0), by=ID][]

Output:

       ID   p p1.1 p1.2 p1.3
 1: 11041 2.9   NA   NA  2.9
 2: 11041 3.6   NA  2.9  3.6
 3: 11041 4.8  2.9  3.6  4.8
 4: 11041 2.6  3.6  4.8  2.6
 5: 11041 2.2  4.8  2.6  2.2
 6: 11041 3.9  2.6  2.2  3.9
 7: 11042 6.5   NA   NA  6.5
 8: 11042 2.9   NA  6.5  2.9
 9: 11042 1.4  6.5  2.9  1.4
10: 11063 0.7   NA   NA  0.7
11: 11063 5.1   NA  0.7  5.1

One way to generalize would be as follows:

  1. Make helper function
f <- function(v,n) {
  setNames(shift(v,(n-1):0), paste0(deparse(substitute(v)),".",1:n))
}
  1. Apply function by ID; here I use n=5
df[, f(p,5), by=ID]

Output:

       ID p.1 p.2 p.3 p.4 p.5
 1: 11041  NA  NA  NA  NA 2.9
 2: 11041  NA  NA  NA 2.9 3.6
 3: 11041  NA  NA 2.9 3.6 4.8
 4: 11041  NA 2.9 3.6 4.8 2.6
 5: 11041 2.9 3.6 4.8 2.6 2.2
 6: 11041 3.6 4.8 2.6 2.2 3.9
 7: 11042  NA  NA  NA  NA 6.5
 8: 11042  NA  NA  NA 6.5 2.9
 9: 11042  NA  NA 6.5 2.9 1.4
10: 11063  NA  NA  NA  NA 0.7
11: 11063  NA  NA  NA 0.7 5.1
  • Related