Extracting a string/numbers from a file name-CodePudding

I have to extract a specific date from a bunch of filenames. I have found that following code can help with it:

dates <- unique(gsub(pattern = "xxxxxx", replacement = "xxxx", x = filenames))

Example file name: LC08_L1TP_211048_20180705_20180717_01_T1_2018-07-05_B5.TIF

Date to extract: 20180705

Can anyone please tell me what to fill in for pattern and replacement in the above code.

CodePudding user response：

If the underscores are in the same place, then this will be enough:

unlist(strsplit(str, "_"))[4]

CodePudding user response：

Assuming you want the 4th underscore-separated field try these. They also work if x is a vector.

1) This uses read.table.

x <- "LC08_L1TP_211048_20180705_20180717_01_T1_2018-07-05_B5.TIF"

read.table(text = x, sep = "_")[[4]]
## [1] 20180705

2) or using sub and a regular expression use this which also works with vector x:

sub("^([[:alnum:]] _){3}(\\d )_.*", "\\2", x)
## [1] "20180705"

3) If the date always appears in character positions 18 through 25 then:

substring(x, 18, 25)
## [1] "20180705"

4) If instead of the above assumption, the assumption is that we want the first occurrence of 8 digits following an underscore then:

sub("^.*?_(\\d{8}).*", "\\1", x)
## [1] "20180705"