calculate distance between two occurrences of the same item in r-CodePudding

I have a data.frame containing items that are either repeated (r) or not repeated (nr). For each item, I would like to calculate the number of rows that intervene between that item and its immediately preceding occurrence.

The result should be that in column 'distance' below:

word <- c ('a', 'b', 'c', 'b', 'd', 'e', 'e', 'f', 'b')
repeated <- c('nr', 'r', 'nr', 'r', 'nr', 'r', 'r', 'nr', 'r')
ds <- as.data.frame(cbind(word, repeated))
ds$distance <- c(NA, NA, NA, 2, NA, NA, 1, NA, 5)

Does anyone know how to solve this? Thank you so much for your help and time!

CodePudding user response：

Here is a simple solution using the dplyr package:

library(dplyr)

ds %>%
  mutate(hlp = 1:n()) %>%
  group_by(word) %>%
  mutate(distance = hlp - lag(hlp))

  word  repeated   hlp distance
  <chr> <chr>    <int>    <int>
1 a     nr           1       NA
2 b     r            2       NA
3 c     nr           3       NA
4 b     r            4        2
5 d     nr           5       NA
6 e     r            6       NA
7 e     r            7        1
8 f     nr           8       NA
9 b     r            9        5