R_how can I use str_sub to split date and time-CodePudding

The following file names were used in a camera trap study. The S number represents the site, P is the plot within a site, C is the camera number within the plot, the first string of numbers is the YearMonthDay and the second string of numbers is the HourMinuteSecond.

file.names <- c( 'S123.P2.C10_20120621_213422.jpg',
                 'S10.P1.C1_20120622_050148.jpg',
                 'S187.P2.C2_20120702_023501.jpg')
file.names

Use a combination of str_sub() and str_split() to produce a data frame with columns corresponding to the site, plot, camera, year, month, days, hour, minute, and second for these three file names. So we want to produce code that will create the data frame:

Site	Plot	Camera	Year	Month	Day	Hour	Minute	Second
S123	P2	C10	2012	06	21	21	34	22
S10	P1	C1	2012	06	22	05	01	48
S187	P2	C2	2012	07	02	02	35	01

My codes are below:

file.names %>%
  str_sub(start = 1, end = -5) %>%
  str_replace_all("_", ".") %>%
  str_split(pattern = fixed("."), n = 5)

I have no idea how to split date and time

CodePudding user response：

nms <- c("Site", "Plot", "Camera", "Year", "Month", "Day", "Hour", "Minute", "Second")

library(tidyverse)
data.frame(file.names) %>%
  extract(file.names, nms, 
          '(\\w )\\.(\\w )\\.(\\w )_(\\d{4})(\\d{2})(\\d{2})_(\\d{2})(\\d{2})(\\d{2})')

  Site Plot Camera Year Month Day Hour Minute Second
1 S123   P2    C10 2012    06  21   21     34     22
2  S10   P1     C1 2012    06  22   05     01     48
3 S187   P2     C2 2012    07  02   02     35     01

in Base R:

type.convert(strcapture('(\\w )\\.(\\w )\\.(\\w )_(\\d{4})(\\d{2})(\\d{2})_(\\d{2})(\\d{2})(\\d{2})',
           file.names, as.list(setNames(character(length(nms)), nms))), as.is = TRUE)

 Site Plot Camera Year Month Day Hour Minute Second
1 S123   P2    C10 2012     6  21   21     34     22
2  S10   P1     C1 2012     6  22    5      1     48
3 S187   P2     C2 2012     7   2    2     35      1

CodePudding user response：

I don't know anything about str_sub or str_split other than the fact that they may be efforts to adapt the sub and strsplit functions to an alternate universe. I just learned base R and have not really seen the need to learn a new syntax. Here's a base solution:

as.POSIXct( sub( "([^_] [_])(\\d{8})[_](\\d{6})", "\\2 \\4", file.names) , format="%Y%m%d %H%M%S")
[1] "2012-06-21" "2012-06-22" "2012-07-02"

You can real the sub pattern as

1) beginning with the start of the string  collect all the non-underscore characters into the first capture group
2) Then get the next 8 digits (if they exist) in a second capture group 
3) and everything that follows will be in a third capture grou

The substitution is to just return the contents of the second capture group. The conversion to Date values is straightforward. I'm assuming that should be clear from the code, but if not then see ?as.Date.

Here's the rest;

as.POSIXct( sub( "([^_] [_])(\\d{8})[_](\\d{6})(. $)", "\\2 \\3", file.names) , format="%Y%m%d %H%M%S")
[1] "2012-06-21 21:34:22 PDT" "2012-06-22 05:01:48 PDT" "2012-07-02 02:35:01 PDT"

If you want the break out then convert to POSIXlt and extract the resulting list.