I have the string that contains various date time information and I wish to extract the earliest date from the string. Note the format of date time information YYYY-MM-DD HH:MM:SS
2022-04-13 23:59:01 - System Administrator (End-user comments)
Blah Blah Blah
2022-04-06 09:57:01 - John (Smith) [Team A] (End-user comments)
Blah blah blah
2022-04-05 17:48:13 - Sarah (Johns) [Team B] (End-user comments)
Blah Blah Blah
2022-04-04 13:34:07 - Robert (Mills) [Team C] (End-user comments)
Blah Blah Blah
What I wish to derive from the above string is "2022-04-04 13:34:07
"
I have over 20k number of obs that look something like the above. Not all obs have 4 different date time information as being illustrated above, sometimes there can be just 2 or sometimes over 10.
CodePudding user response:
We can use str_extract_all
along with sort
:
input <- "2022-04-13 23:59:01 - System Administrator (End-user comments)\nBlah Blah Blah\n\n2022-04-06 09:57:01 - John (Smith) [Team A] (End-user comments)\nBlah blah blah\n\n2022-04-05 17:48:13 - Sarah (Johns) [Team B] (End-user comments)\nBlah Blah Blah\n\n2022-04-04 13:34:07 - Robert (Mills) [Team C] (End-user comments)\nBlah Blah Blah"
ts <- str_extract_all(input, "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}")[[1]]
ts <- sort(ts)
ts[1]
[1] "2022-04-04 13:34:07"
CodePudding user response:
Another option:
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
"2022-04-13 23:59:01 - System Administrator (End-user comments)
Blah Blah Blah
2022-04-06 09:57:01 - John (Smith) [Team A] (End-user comments)
Blah blah blah
2022-04-05 17:48:13 - Sarah (Johns) [Team B] (End-user comments)
Blah Blah Blah
2022-04-04 13:34:07 - Robert (Mills) [Team C] (End-user comments)
Blah Blah Blah" |>
str_extract_all("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}") |>
as_vector() |>
enframe(value = "date") |>
mutate(date = ymd_hms(date)) |>
filter(date == min(date))
#> # A tibble: 1 × 2
#> name date
#> <int> <dttm>
#> 1 4 2022-04-04 13:34:07
Created on 2022-05-19 by the reprex package (v2.0.1)