Home > Net >  Pattern match by pattern and position in r
Pattern match by pattern and position in r

Time:08-11

I have a bunch of files that have unique IDs that correspond to state, latitude, longitude, year, month, and day. All filenames/ID's have the same length. e.g. fl294670818202019

I'd like to use pattern matching to subset a list by the year. The following code does not work as desired due to the fact that the 'year' pattern may be matched by various combinations of longitude and year and/or year and month (as shown in the example above).

Example:

# unique ID with year 2020
x <- "fl301330850282020"
# unique ID with year 2019 (but also matches the pattern 2020)
y <- "fl294670818202019"

# create a list 
(z <- list(x,y))

# subset list by pattern 
z %>% 
  str_subset(pattern = "2020")

Is it possible to skip the first 13 characters, and then perform the search?

I don't want to subset/remove the first 13 characters from the filename because I need the information contained within the filename.

CodePudding user response:

Is the year always the last four? If so, how about:

z[str_ends(z,"2020")]

or:

z[grepl("2020$",z)]

If you want to be explicit about skipping the first 13 characters, you can do this:

z[grepl("2020", str_sub(z,14))]    

or

z[str_detect(str_sub(z,14),"2020")]

or even

grepl("(?<=.{13})2020", z, perl=T)
  • Related