I have two data frames that look like this
data1<-data.frame(name=c("Jessica_Smith","John_Smith","John_Smith") ,
max_year=c(2000,1989,2005),
min_year=c(1990,1989,2001))
data2<-data.frame(name=c("Jessica_Smith","John_Smith","John_Smith") ,
year_of_birth=c(1995,1989,2002),unique_id=c("aby37","nf93","fnd34"))
I would like to merge the data frames such that for each person in data1 I have a unique_id. If possible I would also like to get the nearest integer match for those for whom there is no overlap. To clarify I can't just left_join on name because some entries have the same name but different dates of birth.
I'm relatively new to R so I don't even really know where to begin.
Much appreciated!
CodePudding user response:
You may try,
data1 %>%
left_join(data2, by = "name") %>%
rowwise %>%
mutate(year_key = min_year <= year_of_birth && year_of_birth <= max_year) %>%
filter(year_key == TRUE) %>%
select(-year_key)
name max_year min_year year_of_birth unique_id
<chr> <dbl> <dbl> <dbl> <chr>
1 Jessica_Smith 2000 1990 1995 aby37
2 John_Smith 1989 1989 1989 nf93
3 John_Smith 2005 2001 2002 fnd34
CodePudding user response:
With data.table
you can to non-equal joins:
library(data.table)
data1<-data.table(name=c("Jessica_Smith","John_Smith","John_Smith") ,
max_year=c(2000,1989,2005),
min_year=c(1990,1989,2001))
data2<-data.table(name=c("Jessica_Smith","John_Smith","John_Smith") ,
year_of_birth=c(1995,1989,2002),unique_id=c("aby37","nf93","fnd34"))
data2[data1, .(name, year_of_birth, unique_id),
on = .(name, year_of_birth <= max_year, year_of_birth >= min_year)]
name year_of_birth unique_id
1: Jessica_Smith 2000 aby37
2: John_Smith 1989 nf93
3: John_Smith 2005 fnd34