I have 2 sets of data that look like this (this is a very small subset of it).
data1 <- data.frame("Metal" = c("Al", "Al", "Al", "Al", "Al", "Al", "Al"), "Type" =
c("F", "F", "F", "F", "F", "F", "F"), "Date" = c("2000-01-01", "2000-01-01", "2000-
01-02", "2000-01-03",
"2000-01-03", "2000-01-07", "2000-01-07"), "Time" = c("11:00:00", "12:00:00",
"15:00:00", "13:00:00", "17:00:00", "20:00:00", "20:00:00"), "Value" = c(100, 200,
300, 100, 400, 500, 500))
data2 <- data.frame("Metal" = c("Al", "Al", "Al", "Al", "Al", "Al", "Al"), "Type" =
c("P", "P",
"P", "P", "P",
"P", "P"), "Date" = c("2000-01-01", "2000-01-01", "2000-01-01", "2000-01-03", "2000-
01-03",
"2000-01-04", "2000-01-07"), "Time" = c("11:00:00", "11:00:00", "14:00:00",
"17:00:00", "13:00:00", "16:00:00", "20:00:00"), "Value" = c(100, 100, 200, 900, 100,
400, 999))
I want to keep data from both tables that have the same date and time and create a new table (data3). Sometimes within data1 and data2, there will be duplicates, I don't want data3 to contain those duplicates, just 1 of them and with its pair from the other table. I would also like the output table to be ordered to show the pairs from each table under each other (so my "Type" column would be alternating F, P, F, P, etc.).
Here is my desired output
data3 <- data.frame("Metal" = c("Al", "Al", "Al", "Al", "Al",
"Al", "Al", "Al"), "Type" = c("F", "P", "F",
"P", "F", "P", "F", "P"), "Date" = c("2000-01-01", "2000-01-01",
"2000-01-03", "2000-01-03", "2000-01-03", "2000-01-03", "2001-01-
07", "2001-01-07"), "Time" =
c("11:00:00", "11:00:00", "13:00:00",
"13:00:00", "17:00:00", "17:00:00", "20:00:00", "20:00:00"),
"Value" = c(100, 100, 100, 100, 400, 900, 500, 999))
I have tried using various types of joins from dplyr, but they aren't joining the way I'd like it to.
Thank you for your help!!
CodePudding user response:
We may need bind the data, and then filter
out the duplicates after grouping
library(dplyr)
library(data.table)
bind_rows(data1, data2, .id = 'grp')%>%
group_by(Metal, Date, Time) %>%
filter(n() > 1) %>%
arrange(Date, Time, rowid(grp)) %>%
slice(match(c("F", "P"), Type)) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 8 × 5
Metal Type Date Time Value
<chr> <chr> <chr> <chr> <dbl>
1 Al F 2000-01-01 11:00:00 100
2 Al P 2000-01-01 11:00:00 100
3 Al F 2000-01-03 13:00:00 100
4 Al P 2000-01-03 13:00:00 100
5 Al F 2000-01-03 17:00:00 400
6 Al P 2000-01-03 17:00:00 900
7 Al F 2000-01-07 20:00:00 500
8 Al P 2000-01-07 20:00:00 999
-OP's data
> data3
Metal Type Date Time Value
1 Al F 2000-01-01 11:00:00 100
2 Al P 2000-01-01 11:00:00 100
3 Al F 2000-01-03 13:00:00 100
4 Al P 2000-01-03 13:00:00 100
5 Al F 2000-01-03 17:00:00 400
6 Al P 2000-01-03 17:00:00 900
7 Al F 2001-01-07 20:00:00 500
8 Al P 2001-01-07 20:00:00 999
CodePudding user response:
This was not easy :-)
library(dplyr)
bind_rows(data1, data2) %>%
group_by(Date, Time) %>%
filter(n()>1) %>%
ungroup() %>%
group_by(Type) %>%
arrange(Time) %>%
ungroup() %>%
mutate(Flag = ifelse(Type == "P" & lag(Type, default = last(Type)) == "F", 1, NA)) %>%
mutate(Flag1 = lead(Flag)) %>%
filter(if_any(.cols = starts_with("Flag"), .fns = ~ . == 1)) %>%
select(-starts_with("Flag"))
Metal Type Date Time Value
<chr> <chr> <chr> <chr> <dbl>
1 Al F 2000-01-01 11:00:00 100
2 Al P 2000-01-01 11:00:00 100
3 Al F 2000-01-03 13:00:00 100
4 Al P 2000-01-03 13:00:00 100
5 Al F 2000-01-03 17:00:00 400
6 Al P 2000-01-03 17:00:00 900
7 Al F 2000-01-07 20:00:00 500
8 Al P 2000-01-07 20:00:00 999
CodePudding user response:
There are some methods using the tidyverse package that should work for the ordering, merged with parts of Beeflight31's solution:
data3 <- rbind(data1, data2)
data3 <- data3 %>%
distinct(Metal, Date, Time, Value, .keep_all=TRUE) %>%
group_by(Date, Time, Metal)
CodePudding user response:
Edit: Is that what you're trying to get ?
data3 <- rbind(data1,data2)
data3$unique_date <- paste(data3$Date, data3$Time, sep="_")
matching_rows <- which(duplicated(data3$unique_date))
data3 <- data3[matching_rows,]
data3 <- data3[!duplicated(data3$unique_date),]