Home > Software engineering >  Filtering data to one daily point per individual
Filtering data to one daily point per individual

Time:03-24

I have a dataset containing information about certain individuals for many different dates. What I want is to filter my data to only have one row of data per date, per individual.

I know how to filter it so there's just one data point per overall date, but not how I can limit the data per individual to just have one point per day.

    ID       Date     Time            Datetime      Long    Lat       Status
    305 2022-02-12  4:30:37 2022-02-12 04:30:00 -89.71239 48.00947    MN
    305 2022-02-12  0:00:37 2022-02-12 00:00:00 -89.71250 48.00948    MN
    306 2022-02-12  4:30:37 2022-02-12 04:30:00 -89.71239 48.00947    MN
    306 2022-02-12  0:00:37 2022-02-12 00:00:00 -89.71250 48.00948    MN
    306 2022-04-06  4:30:37 2022-04-06 04:30:00 -89.71239 48.00947    MN
    306 2022-02-12  0:00:37 2022-02-12 00:00:00 -89.71250 48.00948    MN

Would then become

    ID       Date     Time            Datetime      Long    Lat       Status
    305 2022-02-12  4:30:37 2022-02-12 04:30:00 -89.71239 48.00947    MN
    306 2022-02-12  4:30:37 2022-02-12 04:30:00 -89.71239 48.00947    MN
    306 2022-04-06  4:30:37 2022-04-06 04:30:00 -89.71239 48.00947    MN

CodePudding user response:

You can do:

library(dplyr)

df %>% 
  group_by(ID, Date) %>% 
  summarise_all(first)
#> # A tibble: 3 x 7
#> # Groups:   ID [2]
#>      ID Date       Time    Datetime             Long   Lat Status
#>   <int> <chr>      <chr>   <chr>               <dbl> <dbl> <chr> 
#> 1   305 2022-02-12 4:30:37 2022-02-12 04:30:00 -89.7  48.0 MN    
#> 2   306 2022-02-12 4:30:37 2022-02-12 04:30:00 -89.7  48.0 MN    
#> 3   306 2022-04-06 4:30:37 2022-04-06 04:30:00 -89.7  48.0 MN  


Data

df <- structure(list(ID = c(305L, 305L, 306L, 306L, 306L, 306L), Date = c("2022-02-12", 
"2022-02-12", "2022-02-12", "2022-02-12", "2022-04-06", "2022-02-12"
), Time = c("4:30:37", "0:00:37", "4:30:37", "0:00:37", "4:30:37", 
"0:00:37"), Datetime = c("2022-02-12 04:30:00", "2022-02-12 00:00:00", 
"2022-02-12 04:30:00", "2022-02-12 00:00:00", "2022-04-06 04:30:00", 
"2022-02-12 00:00:00"), Long = c(-89.71239, -89.7125, -89.71239, 
-89.7125, -89.71239, -89.7125), Lat = c(48.00947, 48.00948, 48.00947, 
48.00948, 48.00947, 48.00948), Status = c("MN", "MN", "MN", "MN", 
"MN", "MN")), class = "data.frame", row.names = c(NA, -6L))
  • Related