I have data that looks like this:
set.seed(123)
individual <- rep(c('John Doe','Peter Gynn','Jolie Hope', 'Pam Rye'), each = 5)
address <- c('king street', 'market street', 'montgomery road', 'princes ave')
address <- sample(address, size = 20, replace = TRUE)
dat <- data.frame(individual, address)
dat <- dat %>%
group_by(individual) %>%
mutate(id = cur_group_id()) %>%
arrange(id)
I would like to see whether an individual, indicated by an id, had resided at the same address previously.
Let's look at the first rows.
head(dat)
individual address id
<chr> <chr> <int>
1 John Doe montgomery road 1
2 John Doe montgomery road 1
3 John Doe montgomery road 1
4 John Doe market street 1
5 John Doe montgomery road 1
6 Jolie Hope princes ave 2
John Doe first resided at montgomery road, then at market street and then back to montgomery road again. To see whether he previously had resided at montgomery road I could just write:
dat %>%
group_by(id) %>%
mutate(ifelse(lag(address, 2) == address, 1, 0))
But that solution is too specific if the size of the table would increase. Is there are way to see if an id had resided at the same address at any previous row and not just 2 or 3 rows (or whatever is specified) before?
CodePudding user response:
We may use duplicated
library(dplyr)
dat %>%
group_by(id) %>%
mutate(new = (duplicated(address)))