Home > other >  Extracting a string/numbers from a file name
Extracting a string/numbers from a file name

Time:10-21

I have to extract a specific date from a bunch of filenames. I have found that following code can help with it:

dates <- unique(gsub(pattern = "xxxxxx", replacement = "xxxx", x = filenames))

Example file name: LC08_L1TP_211048_20180705_20180717_01_T1_2018-07-05_B5.TIF

Date to extract: 20180705

Can anyone please tell me what to fill in for pattern and replacement in the above code.

CodePudding user response:

If the underscores are in the same place, then this will be enough:

unlist(strsplit(str, "_"))[4]

CodePudding user response:

Assuming you want the 4th underscore-separated field try these. They also work if x is a vector.

1) This uses read.table.

x <- "LC08_L1TP_211048_20180705_20180717_01_T1_2018-07-05_B5.TIF"

read.table(text = x, sep = "_")[[4]]
## [1] 20180705

2) or using sub and a regular expression use this which also works with vector x:

sub("^([[:alnum:]] _){3}(\\d )_.*", "\\2", x)
## [1] "20180705"

3) If the date always appears in character positions 18 through 25 then:

substring(x, 18, 25)
## [1] "20180705"

4) If instead of the above assumption, the assumption is that we want the first occurrence of 8 digits following an underscore then:

sub("^.*?_(\\d{8}).*", "\\1", x)
## [1] "20180705"
  •  Tags:  
  • r
  • Related