Home > database >  R extract earliest date from string
R extract earliest date from string

Time:05-20

I have the string that contains various date time information and I wish to extract the earliest date from the string. Note the format of date time information YYYY-MM-DD HH:MM:SS

2022-04-13 23:59:01 - System Administrator (End-user comments)
Blah Blah Blah

2022-04-06 09:57:01 - John (Smith) [Team A]  (End-user comments)
Blah blah blah
    
2022-04-05 17:48:13 - Sarah (Johns) [Team B]  (End-user comments)
Blah Blah Blah

2022-04-04 13:34:07 - Robert (Mills) [Team C]  (End-user comments)
Blah Blah Blah

What I wish to derive from the above string is "2022-04-04 13:34:07"

I have over 20k number of obs that look something like the above. Not all obs have 4 different date time information as being illustrated above, sometimes there can be just 2 or sometimes over 10.

CodePudding user response:

We can use str_extract_all along with sort:

input <- "2022-04-13 23:59:01 - System Administrator (End-user comments)\nBlah Blah Blah\n\n2022-04-06 09:57:01 - John (Smith) [Team A]  (End-user comments)\nBlah blah blah\n\n2022-04-05 17:48:13 - Sarah (Johns) [Team B]  (End-user comments)\nBlah Blah Blah\n\n2022-04-04 13:34:07 - Robert (Mills) [Team C]  (End-user comments)\nBlah Blah Blah"
ts <- str_extract_all(input, "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}")[[1]]
ts <- sort(ts)
ts[1]

[1] "2022-04-04 13:34:07"

CodePudding user response:

Another option:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

"2022-04-13 23:59:01 - System Administrator (End-user comments)
Blah Blah Blah

2022-04-06 09:57:01 - John (Smith) [Team A]  (End-user comments)
Blah blah blah

2022-04-05 17:48:13 - Sarah (Johns) [Team B]  (End-user comments)
Blah Blah Blah

2022-04-04 13:34:07 - Robert (Mills) [Team C]  (End-user comments)
Blah Blah Blah" |> 
  str_extract_all("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}") |> 
  as_vector() |> 
  enframe(value = "date") |> 
  mutate(date = ymd_hms(date)) |> 
  filter(date == min(date))
#> # A tibble: 1 × 2
#>    name date               
#>   <int> <dttm>             
#> 1     4 2022-04-04 13:34:07

Created on 2022-05-19 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related