I have two dataframes
df1 <- setNames(data.frame(c(as.POSIXct("2022-07-29 00:00:00","2022-07-29 00:05:00","2022-07-29 00:10:00","2022-07-29 00:15:00","2022-07-29 00:20:00")), c(1,2,3,4,5)), c("timeStamp", "value"))
df2 <- setNames(data.frame(c(as.POSIXct("2022-07-29 00:00:05","2022-07-29 00:05:05","2022-07-29 00:20:05")), c("a","b","c")), c("timeStamp", "text"))
I want to merge them into a single data frame by timeStamp, where the times are matched in chronological order, and the mismatching columns are simply filled in with NA. So it would read
timestamp value text
"2022-07-29 00:00:00" 1 NA
"2022-07-29 00:00:05" NA a
...
I can't for my life figure out why simply merge(x,y,by=) doesn't work for this. neither does left_join. It seems so simple and I can't seem to figure it out.
Also, a bonus would be to make it work this way as well, in which each time there is a change in text, all the rows below it remain the last text until the text changes again prompted by df2:
timestamp value text
"2022-07-29 00:00:00" 1 NA
"2022-07-29 00:00:05" NA a
"2022-07-29 00:05:00" 2 a
"2022-07-29 00:05:05" NA b
"2022-07-29 00:10:00" 3 b
"2022-07-29 00:15:00" 4 b
"2022-07-29 00:20:00" 5 b
"2022-07-29 00:20:05" NA c
Thanks
CodePudding user response:
You need to do a full join.
tidyverse solution:
df1 %>%
full_join(df2, c("timeStamp" = "timeStamp")) %>%
arrange(timeStamp) %>%
tidyr::fill(text)
timeStamp value text
1 2022-07-29 00:00:00 1 <NA>
2 2022-07-29 00:00:05 NA a
3 2022-07-29 00:05:00 2 a
4 2022-07-29 00:05:05 NA b
5 2022-07-29 00:10:00 3 b
6 2022-07-29 00:15:00 4 b
7 2022-07-29 00:20:00 5 b
8 2022-07-29 00:20:05 NA c
Base R merge:
merge(df1, df2, by = c("timeStamp" = "timeStamp"), all = T)
CodePudding user response:
Using the corrected definitions of df1 and df2 in the Note at the end perform a merge
using all = TRUE
and then use na.locf0
to fill in the NA's.
library(zoo)
m <- merge(df1, df2, all = TRUE) |> transform(text = na.locf0(text)); m
## timeStamp value text
## 1 2022-07-29 00:00:00 1 <NA>
## 2 2022-07-29 00:00:05 NA a
## 3 2022-07-29 00:05:00 2 a
## 4 2022-07-29 00:05:05 NA b
## 5 2022-07-29 00:10:00 3 b
## 6 2022-07-29 00:15:00 4 b
## 7 2022-07-29 00:20:00 5 b
## 8 2022-07-29 00:20:05 NA c
If the different values of text actually refer to different series that should be put in separate columns then we can create the following zoo series. (Use fortify.zoo(z)
if you need to convert that to a data frame.)
z <- read.zoo(m, split = "text"); z
## a b c
## 2022-07-29 00:00:05 NA NA NA
## 2022-07-29 00:05:00 2 NA NA
## 2022-07-29 00:05:05 NA NA NA
## 2022-07-29 00:10:00 NA 3 NA
## 2022-07-29 00:15:00 NA 4 NA
## 2022-07-29 00:20:00 NA 5 NA
## 2022-07-29 00:20:05 NA NA NA
Note
df1 <- data.frame(timeStamp = as.POSIXct(c("2022-07-29 00:00:00",
"2022-07-29 00:05:00", "2022-07-29 00:10:00","2022-07-29 00:15:00",
"2022-07-29 00:20:00")), value = c(1,2,3,4,5))
df2 <- data.frame(timeStamp = as.POSIXct(c("2022-07-29 00:00:05",
"2022-07-29 00:05:05", "2022-07-29 00:20:05")),
text = c("a","b","c"))