Home > other >  R_how can I use str_sub to split date and time
R_how can I use str_sub to split date and time

Time:03-03

The following file names were used in a camera trap study. The S number represents the site, P is the plot within a site, C is the camera number within the plot, the first string of numbers is the YearMonthDay and the second string of numbers is the HourMinuteSecond.

file.names <- c( 'S123.P2.C10_20120621_213422.jpg',
                 'S10.P1.C1_20120622_050148.jpg',
                 'S187.P2.C2_20120702_023501.jpg')
file.names

Use a combination of str_sub() and str_split() to produce a data frame with columns corresponding to the site, plot, camera, year, month, days, hour, minute, and second for these three file names. So we want to produce code that will create the data frame:

Site Plot Camera Year Month Day Hour Minute Second
S123 P2 C10 2012 06 21 21 34 22
S10 P1 C1 2012 06 22 05 01 48
S187 P2 C2 2012 07 02 02 35 01

My codes are below:

file.names %>%
  str_sub(start = 1, end = -5) %>%
  str_replace_all("_", ".") %>%
  str_split(pattern = fixed("."), n = 5)

I have no idea how to split date and time

CodePudding user response:

nms <- c("Site", "Plot", "Camera", "Year", "Month", "Day", "Hour", "Minute", "Second")

library(tidyverse)
data.frame(file.names) %>%
  extract(file.names, nms, 
          '(\\w )\\.(\\w )\\.(\\w )_(\\d{4})(\\d{2})(\\d{2})_(\\d{2})(\\d{2})(\\d{2})')

  Site Plot Camera Year Month Day Hour Minute Second
1 S123   P2    C10 2012    06  21   21     34     22
2  S10   P1     C1 2012    06  22   05     01     48
3 S187   P2     C2 2012    07  02   02     35     01

in Base R:

type.convert(strcapture('(\\w )\\.(\\w )\\.(\\w )_(\\d{4})(\\d{2})(\\d{2})_(\\d{2})(\\d{2})(\\d{2})',
           file.names, as.list(setNames(character(length(nms)), nms))), as.is = TRUE)

 Site Plot Camera Year Month Day Hour Minute Second
1 S123   P2    C10 2012     6  21   21     34     22
2  S10   P1     C1 2012     6  22    5      1     48
3 S187   P2     C2 2012     7   2    2     35      1

CodePudding user response:

I don't know anything about str_sub or str_split other than the fact that they may be efforts to adapt the sub and strsplit functions to an alternate universe. I just learned base R and have not really seen the need to learn a new syntax. Here's a base solution:

as.POSIXct( sub( "([^_] [_])(\\d{8})[_](\\d{6})", "\\2 \\4", file.names) , format="%Y%m%d %H%M%S")
[1] "2012-06-21" "2012-06-22" "2012-07-02"

You can real the sub pattern as

1) beginning with the start of the string  collect all the non-underscore characters into the first capture group
2) Then get the next 8 digits (if they exist) in a second capture group 
3) and everything that follows will be in a third capture grou

The substitution is to just return the contents of the second capture group. The conversion to Date values is straightforward. I'm assuming that should be clear from the code, but if not then see ?as.Date.

Here's the rest;

as.POSIXct( sub( "([^_] [_])(\\d{8})[_](\\d{6})(. $)", "\\2 \\3", file.names) , format="%Y%m%d %H%M%S")
[1] "2012-06-21 21:34:22 PDT" "2012-06-22 05:01:48 PDT" "2012-07-02 02:35:01 PDT"

If you want the break out then convert to POSIXlt and extract the resulting list.

  •  Tags:  
  • r
  • Related