Home > OS >  How to combine two data-frames by group and date
How to combine two data-frames by group and date

Time:10-03

I'd like to combine two dataframes d1 and d2 in R. It should be grouped by "ID", but at the end there should only exist one column "date" so that further analysis are possible (It is important to know if the "number"-entry (d2) is linked to the "type"-entry (d1) or how close it is to it)

Lets say d1 and d2 is

d1 <- data.frame(ID = c(1,1,2,2,2,3), 
                 sex = c("male", "male", "male", "male", "male", "female"),
                 date = c("2016-06-21 08:15:00", "2016-06-21 23:00:00",
                          "2017-04-19 07:15:00", "2017-04-20 07:15:00", "2017-04-20 08:15:00",
                          "2017-08-20 23:15:00" ),
                 type = c("horse", "snake", "horse", "horse", "monkey", "fish"))

d2 <- data.frame(ID = c(1,1,1,2,2,3,3), 
                 date = c("2016-06-20 08:15:00", "2016-06-21 22:15:00", "2016-06-22 08:45:00",
                          "2017-04-19 07:15:00", "2017-04-19 09:30:00",
                          "2017-08-19 08:15:00", "2017-08-21 06:30:00"),
                 number = c(65,234,64,234,6243,23,243))

My desired output would be

d12 <- data.frame(ID = c(1,1,1,1,1,2,2,2,2,3,3,3),
                  sex = c(NA, "male", NA, "male", NA, "male", NA, "male", "male" , NA, "female", NA),
                  date = c("2016-06-20 08:15:00","2016-06-21 08:15:00", "2016-06-21 22:15:00", "2016-06-21 23:00:00", "2016-06-22 08:45:00",
                          "2017-04-19 07:15:00", "2017-04-19 09:30:00", "2017-04-20 07:15:00", "2017-04-20 08:15:00",
                          "2017-08-19 08:15:00", "2017-08-20 23:15:00", "2017-08-21 06:30:00" ),
                  type = c(NA, "horse", NA, "snake", NA, "horse", NA, "horse", "monkey", NA, "fish", NA),
                  number= c(65, NA, 234, NA, 64, 234, 6243, NA, NA, 23, NA, 243))

I tried doing this

d12 <- full_join(
  d1 %>% group_by(ID) %>% mutate(time=row_number()),
  d2 %>% group_by(ID) %>% mutate(time=row_number()),
by = c("ID", "time"))

But more or less I just added some columns, I don't really come closer to my desired output. I am very looking forward to your help.

CodePudding user response:

I'm still a little bit puzzled about this question. You could try

library(dplyr)

d1 %>% 
  full_join(d2, by = c("ID", "date")) %>% 
  mutate(date = as.POSIXct(date)) %>% 
  arrange(ID, date) %>% 
  tibble() # this line isn't really necessary

This returns

      ID sex    date                type   number
   <dbl> <chr>  <dttm>              <chr>   <dbl>
 1     1 NA     2016-06-20 08:15:00 NA         65
 2     1 male   2016-06-21 08:15:00 horse      NA
 3     1 NA     2016-06-21 22:15:00 NA        234
 4     1 male   2016-06-21 23:00:00 snake      NA
 5     1 NA     2016-06-22 08:45:00 NA         64
 6     2 male   2017-04-19 07:15:00 horse     234
 7     2 NA     2017-04-19 09:30:00 NA       6243
 8     2 male   2017-04-20 07:15:00 horse      NA
 9     2 male   2017-04-20 08:15:00 monkey     NA
10     3 NA     2017-08-19 08:15:00 NA         23
11     3 female 2017-08-20 23:15:00 fish       NA
12     3 NA     2017-08-21 06:30:00 NA        243
  • Related