Home > Blockchain >  Keep only common dates using dplyr package
Keep only common dates using dplyr package

Time:01-15

I have a data frame with 3 columns: station, datetime and miles. I want to keep the rows that have common datetime for each station - only common datetimes amongst ALL STATIONS.

I've created a reproducible example to be more clear

library(tidyverse)
library(lubridate)

df <- data.frame(station = c("A","A","A","B","B","B",
                                  "C","C","C"),
  date = c("1998-05-03","1999-06-01","2000-03-02",
           "1998-05-03","1999-06-01","2000-03-02",
           "1998-05-03","1999-06-01","2000-04-15"),
                      time  = c("00:00:10","00:00:20","00:00:50",
                                "00:00:10","00:00:20","00:00:40",
                                "00:00:34","00:00:20","00:00:40"),
                      miles = rnorm(9))

df <- df %>% 
  mutate(datetime = paste(date,time,sep = " "),
         datetime = as_datetime(datetime)) %>% 
  select(station,datetime,miles)
station datetime miles
A 1998-05-03 00:00:10 1.8587913
A 1999-06-01 00:00:20 0.1271054
A 2000-03-02 00:00:50 1.4531250
B 1998-05-03 00:00:10 0.3544122
B 1999-06-01 00:00:20 0.1033785
B 2000-03-02 00:00:40 0.9861990
C 1998-05-03 00:00:34 1.5029350
C 1999-06-01 00:00:20 1.1215914
C 2000-04-15 00:00:40 0.5222949

Desirable output

station datetime miles
A 1999-06-01 00:00:20 0.1271054
B 1999-06-01 00:00:20 0.1033785
C 1999-06-01 00:00:20 1.1215914

So, only "1999-06-01 00:00:20" is available datetime object in ALL stations.

I tried the following code but it didn't to work :

df %>%
  filter(station %in% (split(df$station, df$datetime) %>% reduce(intersect)))

Do you have any suggestion ? (I prefer dplyr solution but any solution is acceptable)

CodePudding user response:

We may group by datetime and use filter with n_distinct

library(dplyr)
 df %>%
   group_by(datetime) %>% 
   filter(n_distinct(station) == n_distinct(df$station)) %>%
   ungroup

-output

# A tibble: 3 × 3
  station datetime             miles
  <chr>   <dttm>               <dbl>
1 A       1999-06-01 00:00:20  0.209
2 B       1999-06-01 00:00:20 -0.474
3 C       1999-06-01 00:00:20  0.764

Or using the OP's code, it should be splti on the 'datetime'

df %>%
  filter(datetime %in% (split(datetime, station) %>% 
       reduce(intersect)))
station            datetime      miles
1       A 1999-06-01 00:00:20  0.2093250
2       B 1999-06-01 00:00:20 -0.4744889
3       C 1999-06-01 00:00:20  0.7644213

CodePudding user response:

Code

df %>% 
  mutate(aux = n_distinct(station)) %>% 
  group_by(datetime) %>% 
  filter(n() == aux) %>% 
  select(-aux)

Output

 # A tibble: 3 x 3
# Groups:   datetime [1]
  station datetime             miles
  <chr>   <dttm>               <dbl>
1 A       1999-06-01 00:00:20  2.09 
2 B       1999-06-01 00:00:20  0.154
3 C       1999-06-01 00:00:20 -0.951

CodePudding user response:

Please try

grp <- length(unique(df$station))

df2 <- df %>% dplyr::arrange(date, time) %>% 
  dplyr::group_by(date,time) %>% mutate(n=n()) %>%  
  filter(n==grp) %>% dplyr::arrange(station) 
  • Related