I have a dataset that looks like this.
library(tidyverse)
data <- structure(list(Timestamp = c("12/7/2021 11:29:15", "12/7/2021 11:29:35",
"12/7/2021 11:59:18", "12/7/2021 12:28:53",
"12/7/2021 13:19:12"),
ID = c(157767, 380925, 319956, 375889, 375889),
Status = c("Yes", "No", "Yes", "No", "Yes")),
row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
print(data)
> data
# A tibble: 5 x 3
Timestamp ID Status
<chr> <dbl> <chr>
1 12/7/2021 11:29:15 157767 Yes
2 12/7/2021 11:29:35 380925 No
3 12/7/2021 11:59:18 319956 Yes
4 12/7/2021 12:28:53 375889 No
5 12/7/2021 13:19:12 375889 Yes
Rows 4 & 5 have the same ID. Based off the TimeStamp, I can see that row 5 was done later and has and updated status.
Is it possible for me to filter based off timestamp to only show the most recent timestamps in my data frame? my Desired output would be the following
Timestamp ID Status
<chr> <dbl> <chr>
1 12/7/2021 11:29:15 157767 Yes
2 12/7/2021 11:29:35 380925 No
3 12/7/2021 11:59:18 319956 Yes
4 12/7/2021 13:19:12 375889 Yes
Thank you.
CodePudding user response:
With dplyr::slice_max
you could do:
Note: As your Timestamp
is a character we first have to convert to a datetime to make this work (Thanks to @utubun for pointing that out in his comment)
library(dplyr)
data %>%
group_by(ID) %>%
slice_max(as.POSIXct(Timestamp, format = "%d/%m/%Y %H:%M:%S")) %>%
ungroup() %>%
arrange(Timestamp)
#> # A tibble: 4 × 3
#> Timestamp ID Status
#> <chr> <dbl> <chr>
#> 1 12/7/2021 11:29:15 157767 Yes
#> 2 12/7/2021 11:29:35 380925 No
#> 3 12/7/2021 11:59:18 319956 Yes
#> 4 12/7/2021 13:19:12 375889 Yes
CodePudding user response:
- First transform
Timestamp
todttm
class. - group and filter by
last()
library(dplyr)
library(lubridate)
data %>%
mutate(Timestamp = dmy_hms(Timestamp)) %>%
group_by(ID) %>%
filter(Timestamp == last(Timestamp))
Timestamp ID Status
<dttm> <dbl> <chr>
1 2021-07-12 11:29:15 157767 Yes
2 2021-07-12 11:29:35 380925 No
3 2021-07-12 11:59:18 319956 Yes
4 2021-07-12 13:19:12 375889 Yes
``´