I have a data.frame containing items that are either repeated (r
) or not repeated (nr
). For each item, I would like to calculate the number of rows that intervene between that item and its immediately preceding occurrence.
The result should be that in column 'distance' below:
word <- c ('a', 'b', 'c', 'b', 'd', 'e', 'e', 'f', 'b')
repeated <- c('nr', 'r', 'nr', 'r', 'nr', 'r', 'r', 'nr', 'r')
ds <- as.data.frame(cbind(word, repeated))
ds$distance <- c(NA, NA, NA, 2, NA, NA, 1, NA, 5)
Does anyone know how to solve this? Thank you so much for your help and time!
CodePudding user response:
Here is a simple solution using the dplyr
package:
library(dplyr)
ds %>%
mutate(hlp = 1:n()) %>%
group_by(word) %>%
mutate(distance = hlp - lag(hlp))
word repeated hlp distance
<chr> <chr> <int> <int>
1 a nr 1 NA
2 b r 2 NA
3 c nr 3 NA
4 b r 4 2
5 d nr 5 NA
6 e r 6 NA
7 e r 7 1
8 f nr 8 NA
9 b r 9 5