Home > Software engineering >  how to extract specific character using str_extrac() in R
how to extract specific character using str_extrac() in R

Time:09-28

Context

I have a character vector a.

I want to extract the text between the last slash(/) and the .nc using the str_extract()function.

I have tried like this: str_extract(a, "(?=/).*(?=.nc)"), but failed.

Question

How can I get the text between the last lash and .nc in character vector a.

Reproducible code

a = c(
  'data/temp/air/pm2.5/pm2.5_year_2014.nc',
  'data/temp/air/pm10/pm10_year_2014.nc',
  'efcv/asdfe/weewr/rtrkhh/ss_fef_10233_dfdfe.nc'
)

# My solution (failed)

str_extract(a, "(?=/).*(?=.nc)")
# [1] "/temp/air/pm2.5/pm2.5_year_2014"       
# [2] "/temp/air/pm10/pm10_year_2014"         
# [3] "/asdfe/weewr/rtrkhh/ss_fef_10233_dfdfe"


# The expected output should like this:

# [1] "pm2.5_year_2014"       
# [2] "pm10_year_2014"         
# [3] "ss_fef_10233_dfdfe"

CodePudding user response:

Here is a regex replacement approach:

a = c(
    'data/temp/air/pm2.5/pm2.5_year_2014.nc',
    'data/temp/air/pm10/pm10_year_2014.nc',
    'efcv/asdfe/weewr/rtrkhh/ss_fef_10233_dfdfe.nc'
)
output <- gsub(".*/|\\.[^.] $", "", a)
output

[1] "pm2.5_year_2014"    "pm10_year_2014"     "ss_fef_10233_dfdfe"

Here is the regex logic:

  • .*/ match everything from the start of the string until the last /
  • | OR
  • \.[^.] $ match everything from final dot until the end of the string

Then we replace these matches by empty string to remove them, leaving behind the filenames.

  • Related