Keeping pairs of data that have the same date and time in R-CodePudding

I have 2 sets of data that look like this (this is a very small subset of it).

data1 <- data.frame("Metal" = c("Al", "Al", "Al", "Al", "Al", "Al", "Al"), "Type" = 
c("F", "F", "F", "F", "F", "F", "F"), "Date" = c("2000-01-01", "2000-01-01", "2000- 
01-02", "2000-01-03",                                                                                        
"2000-01-03", "2000-01-07", "2000-01-07"), "Time" = c("11:00:00", "12:00:00", 
"15:00:00", "13:00:00", "17:00:00", "20:00:00", "20:00:00"), "Value" = c(100, 200, 
300, 100, 400, 500, 500))

data2 <- data.frame("Metal" = c("Al", "Al", "Al", "Al", "Al", "Al", "Al"), "Type" = 
c("P", "P", 
                                                                      "P", "P", "P", 
"P", "P"), "Date" = c("2000-01-01", "2000-01-01", "2000-01-01", "2000-01-03", "2000- 
01-03", 
                                                                                                 
"2000-01-04", "2000-01-07"), "Time" = c("11:00:00", "11:00:00", "14:00:00", 
"17:00:00", "13:00:00", "16:00:00", "20:00:00"), "Value" = c(100, 100, 200, 900, 100, 
400, 999))

I want to keep data from both tables that have the same date and time and create a new table (data3). Sometimes within data1 and data2, there will be duplicates, I don't want data3 to contain those duplicates, just 1 of them and with its pair from the other table. I would also like the output table to be ordered to show the pairs from each table under each other (so my "Type" column would be alternating F, P, F, P, etc.).

Here is my desired output

data3 <- data.frame("Metal" = c("Al", "Al", "Al", "Al", "Al", 
"Al", "Al", "Al"), "Type" = c("F", "P", "F", 
                                                                
"P", "F", "P", "F", "P"), "Date" = c("2000-01-01", "2000-01-01", 
"2000-01-03", "2000-01-03", "2000-01-03", "2000-01-03", "2001-01- 
07", "2001-01-07"), "Time" = 
                  c("11:00:00", "11:00:00", "13:00:00", 
"13:00:00", "17:00:00", "17:00:00", "20:00:00", "20:00:00"), 
"Value" = c(100, 100, 100, 100, 400, 900, 500, 999))

I have tried using various types of joins from dplyr, but they aren't joining the way I'd like it to.

Thank you for your help!!

CodePudding user response：

We may need bind the data, and then filter out the duplicates after grouping

library(dplyr)
library(data.table)
bind_rows(data1, data2, .id = 'grp')%>%
  group_by(Metal, Date, Time) %>%
  filter(n() > 1) %>%
  arrange(Date, Time, rowid(grp)) %>%
  slice(match(c("F", "P"), Type)) %>%
  ungroup %>% 
  select(-grp)

-output

# A tibble: 8 × 5
  Metal Type  Date       Time     Value
  <chr> <chr> <chr>      <chr>    <dbl>
1 Al    F     2000-01-01 11:00:00   100
2 Al    P     2000-01-01 11:00:00   100
3 Al    F     2000-01-03 13:00:00   100
4 Al    P     2000-01-03 13:00:00   100
5 Al    F     2000-01-03 17:00:00   400
6 Al    P     2000-01-03 17:00:00   900
7 Al    F     2000-01-07 20:00:00   500
8 Al    P     2000-01-07 20:00:00   999

-OP's data

> data3
  Metal Type       Date     Time Value
1    Al    F 2000-01-01 11:00:00   100
2    Al    P 2000-01-01 11:00:00   100
3    Al    F 2000-01-03 13:00:00   100
4    Al    P 2000-01-03 13:00:00   100
5    Al    F 2000-01-03 17:00:00   400
6    Al    P 2000-01-03 17:00:00   900
7    Al    F 2001-01-07 20:00:00   500
8    Al    P 2001-01-07 20:00:00   999

CodePudding user response：

This was not easy :-)

library(dplyr)

bind_rows(data1, data2) %>% 
  group_by(Date, Time) %>% 
  filter(n()>1) %>% 
  ungroup() %>% 
  group_by(Type) %>% 
  arrange(Time) %>% 
  ungroup() %>% 
  mutate(Flag = ifelse(Type == "P" & lag(Type, default = last(Type)) == "F", 1, NA)) %>% 
  mutate(Flag1 = lead(Flag)) %>% 
  filter(if_any(.cols = starts_with("Flag"), .fns = ~ . == 1)) %>% 
  select(-starts_with("Flag"))

  Metal Type  Date       Time     Value
  <chr> <chr> <chr>      <chr>    <dbl>
1 Al    F     2000-01-01 11:00:00   100
2 Al    P     2000-01-01 11:00:00   100
3 Al    F     2000-01-03 13:00:00   100
4 Al    P     2000-01-03 13:00:00   100
5 Al    F     2000-01-03 17:00:00   400
6 Al    P     2000-01-03 17:00:00   900
7 Al    F     2000-01-07 20:00:00   500
8 Al    P     2000-01-07 20:00:00   999

CodePudding user response：

There are some methods using the tidyverse package that should work for the ordering, merged with parts of Beeflight31's solution:

data3 <- rbind(data1, data2)
data3 <- data3 %>%
  distinct(Metal, Date, Time, Value, .keep_all=TRUE) %>%
  group_by(Date, Time, Metal)

CodePudding user response：

Edit: Is that what you're trying to get ?

data3 <- rbind(data1,data2)
data3$unique_date <- paste(data3$Date, data3$Time, sep="_")
matching_rows <- which(duplicated(data3$unique_date))
data3 <- data3[matching_rows,]
data3 <- data3[!duplicated(data3$unique_date),]