The following file names were used in a camera trap study. The S number represents the site, P is the plot within a site, C is the camera number within the plot, the first string of numbers is the YearMonthDay and the second string of numbers is the HourMinuteSecond.
file.names <- c( 'S123.P2.C10_20120621_213422.jpg',
'S10.P1.C1_20120622_050148.jpg',
'S187.P2.C2_20120702_023501.jpg')
file.names
Use a combination of str_sub()
and str_split()
to produce a data frame with columns corresponding to the site, plot, camera, year, month, days, hour, minute, and second for these three file names. So we want to produce code that will create the data frame:
Site | Plot | Camera | Year | Month | Day | Hour | Minute | Second |
---|---|---|---|---|---|---|---|---|
S123 | P2 | C10 | 2012 | 06 | 21 | 21 | 34 | 22 |
S10 | P1 | C1 | 2012 | 06 | 22 | 05 | 01 | 48 |
S187 | P2 | C2 | 2012 | 07 | 02 | 02 | 35 | 01 |
My codes are below:
file.names %>%
str_sub(start = 1, end = -5) %>%
str_replace_all("_", ".") %>%
str_split(pattern = fixed("."), n = 5)
I have no idea how to split date and time
CodePudding user response:
nms <- c("Site", "Plot", "Camera", "Year", "Month", "Day", "Hour", "Minute", "Second")
library(tidyverse)
data.frame(file.names) %>%
extract(file.names, nms,
'(\\w )\\.(\\w )\\.(\\w )_(\\d{4})(\\d{2})(\\d{2})_(\\d{2})(\\d{2})(\\d{2})')
Site Plot Camera Year Month Day Hour Minute Second
1 S123 P2 C10 2012 06 21 21 34 22
2 S10 P1 C1 2012 06 22 05 01 48
3 S187 P2 C2 2012 07 02 02 35 01
in Base R:
type.convert(strcapture('(\\w )\\.(\\w )\\.(\\w )_(\\d{4})(\\d{2})(\\d{2})_(\\d{2})(\\d{2})(\\d{2})',
file.names, as.list(setNames(character(length(nms)), nms))), as.is = TRUE)
Site Plot Camera Year Month Day Hour Minute Second
1 S123 P2 C10 2012 6 21 21 34 22
2 S10 P1 C1 2012 6 22 5 1 48
3 S187 P2 C2 2012 7 2 2 35 1
CodePudding user response:
I don't know anything about str_sub
or str_split
other than the fact that they may be efforts to adapt the sub
and strsplit
functions to an alternate universe. I just learned base R and have not really seen the need to learn a new syntax. Here's a base solution:
as.POSIXct( sub( "([^_] [_])(\\d{8})[_](\\d{6})", "\\2 \\4", file.names) , format="%Y%m%d %H%M%S")
[1] "2012-06-21" "2012-06-22" "2012-07-02"
You can real the sub
pattern as
1) beginning with the start of the string collect all the non-underscore characters into the first capture group
2) Then get the next 8 digits (if they exist) in a second capture group
3) and everything that follows will be in a third capture grou
The substitution is to just return the contents of the second capture group. The conversion to Date values is straightforward. I'm assuming that should be clear from the code, but if not then see ?as.Date
.
Here's the rest;
as.POSIXct( sub( "([^_] [_])(\\d{8})[_](\\d{6})(. $)", "\\2 \\3", file.names) , format="%Y%m%d %H%M%S")
[1] "2012-06-21 21:34:22 PDT" "2012-06-22 05:01:48 PDT" "2012-07-02 02:35:01 PDT"
If you want the break out then convert to POSIXlt
and extract the resulting list.