I have files names in the following format
Incomplete-Provider-Apr18-revised-XLS-8187K.xls
How can I extract the Apr18 part from the file name, and ideally turn it into something like 2018-04-01
I have tried things like str_extract
, using a vector of month names, but that does not seem to work.
CodePudding user response:
Here is one way using gsub
from base
:
Sys.setlocale("LC_TIME", "C")
x <- "Incomplete-Provider-Apr18-revised-XLS-8187K.xls"
as.Date(gsub(".*-(\\w{3})(\\d{2})-.*", "\\1-\\2-01", x), format = "%B-%y-%d")
[1] "2018-04-01"
Basically I use regular expressions to extract the date form the filename.
I always assume that the date is in format three letter \\w{3}
followed by two digits \\d{2}
.
CodePudding user response:
Or with stringr::str_extract
:
> as.Date(paste(stringr::str_extract(s, '(\\w \\d )'), '01'), format='%b%y %d')
[1] "2018-04-01"
>