Home > OS >  Extract abbreviated month and year from filename
Extract abbreviated month and year from filename

Time:10-21

I have files names in the following format

Incomplete-Provider-Apr18-revised-XLS-8187K.xls

How can I extract the Apr18 part from the file name, and ideally turn it into something like 2018-04-01

I have tried things like str_extract, using a vector of month names, but that does not seem to work.

CodePudding user response:

Here is one way using gsub from base:

Sys.setlocale("LC_TIME", "C")

x <- "Incomplete-Provider-Apr18-revised-XLS-8187K.xls"
as.Date(gsub(".*-(\\w{3})(\\d{2})-.*", "\\1-\\2-01", x), format = "%B-%y-%d")

[1] "2018-04-01"

Basically I use regular expressions to extract the date form the filename. I always assume that the date is in format three letter \\w{3} followed by two digits \\d{2}.

CodePudding user response:

Or with stringr::str_extract:

> as.Date(paste(stringr::str_extract(s, '(\\w \\d )'), '01'), format='%b%y %d')
[1] "2018-04-01"
> 
  • Related