Home > Enterprise >  Lag multiple but unknown number of rows by group in r
Lag multiple but unknown number of rows by group in r

Time:10-01

I have data that looks like this:

set.seed(123)

individual <- rep(c('John Doe','Peter Gynn','Jolie Hope', 'Pam Rye'), each = 5)

address <- c('king street', 'market street', 'montgomery road', 'princes ave')

address <- sample(address, size = 20, replace = TRUE)

dat <- data.frame(individual, address)

dat <- dat %>%
  group_by(individual) %>%
  mutate(id = cur_group_id()) %>%
  arrange(id)

I would like to see whether an individual, indicated by an id, had resided at the same address previously.

Let's look at the first rows.

head(dat)

  individual address            id
  <chr>      <chr>           <int>
1 John Doe   montgomery road     1
2 John Doe   montgomery road     1
3 John Doe   montgomery road     1
4 John Doe   market street       1
5 John Doe   montgomery road     1
6 Jolie Hope princes ave         2

John Doe first resided at montgomery road, then at market street and then back to montgomery road again. To see whether he previously had resided at montgomery road I could just write:

dat %>%
  group_by(id) %>%
  mutate(ifelse(lag(address, 2) == address, 1, 0))

But that solution is too specific if the size of the table would increase. Is there are way to see if an id had resided at the same address at any previous row and not just 2 or 3 rows (or whatever is specified) before?

CodePudding user response:

We may use duplicated

library(dplyr)
dat %>% 
   group_by(id) %>%
    mutate(new =  (duplicated(address)))
  • Related