I have a data frame with 3 columns: station, datetime and miles. I want to keep the rows that have common datetime for each station - only common datetimes amongst ALL STATIONS.
I've created a reproducible example to be more clear
library(tidyverse)
library(lubridate)
df <- data.frame(station = c("A","A","A","B","B","B",
"C","C","C"),
date = c("1998-05-03","1999-06-01","2000-03-02",
"1998-05-03","1999-06-01","2000-03-02",
"1998-05-03","1999-06-01","2000-04-15"),
time = c("00:00:10","00:00:20","00:00:50",
"00:00:10","00:00:20","00:00:40",
"00:00:34","00:00:20","00:00:40"),
miles = rnorm(9))
df <- df %>%
mutate(datetime = paste(date,time,sep = " "),
datetime = as_datetime(datetime)) %>%
select(station,datetime,miles)
station | datetime | miles |
---|---|---|
A | 1998-05-03 00:00:10 | 1.8587913 |
A | 1999-06-01 00:00:20 | 0.1271054 |
A | 2000-03-02 00:00:50 | 1.4531250 |
B | 1998-05-03 00:00:10 | 0.3544122 |
B | 1999-06-01 00:00:20 | 0.1033785 |
B | 2000-03-02 00:00:40 | 0.9861990 |
C | 1998-05-03 00:00:34 | 1.5029350 |
C | 1999-06-01 00:00:20 | 1.1215914 |
C | 2000-04-15 00:00:40 | 0.5222949 |
Desirable output
station | datetime | miles |
---|---|---|
A | 1999-06-01 00:00:20 | 0.1271054 |
B | 1999-06-01 00:00:20 | 0.1033785 |
C | 1999-06-01 00:00:20 | 1.1215914 |
So, only "1999-06-01 00:00:20" is available datetime object in ALL stations.
I tried the following code but it didn't to work :
df %>%
filter(station %in% (split(df$station, df$datetime) %>% reduce(intersect)))
Do you have any suggestion ? (I prefer dplyr solution but any solution is acceptable)
CodePudding user response:
We may group by datetime and use filter
with n_distinct
library(dplyr)
df %>%
group_by(datetime) %>%
filter(n_distinct(station) == n_distinct(df$station)) %>%
ungroup
-output
# A tibble: 3 × 3
station datetime miles
<chr> <dttm> <dbl>
1 A 1999-06-01 00:00:20 0.209
2 B 1999-06-01 00:00:20 -0.474
3 C 1999-06-01 00:00:20 0.764
Or using the OP's code, it should be splti
on the 'datetime'
df %>%
filter(datetime %in% (split(datetime, station) %>%
reduce(intersect)))
station datetime miles
1 A 1999-06-01 00:00:20 0.2093250
2 B 1999-06-01 00:00:20 -0.4744889
3 C 1999-06-01 00:00:20 0.7644213
CodePudding user response:
Code
df %>%
mutate(aux = n_distinct(station)) %>%
group_by(datetime) %>%
filter(n() == aux) %>%
select(-aux)
Output
# A tibble: 3 x 3
# Groups: datetime [1]
station datetime miles
<chr> <dttm> <dbl>
1 A 1999-06-01 00:00:20 2.09
2 B 1999-06-01 00:00:20 0.154
3 C 1999-06-01 00:00:20 -0.951
CodePudding user response:
Please try
grp <- length(unique(df$station))
df2 <- df %>% dplyr::arrange(date, time) %>%
dplyr::group_by(date,time) %>% mutate(n=n()) %>%
filter(n==grp) %>% dplyr::arrange(station)