Context
I have a character vector a
.
I want to extract the text between the last slash(/
) and the .nc
using the str_extract()
function.
I have tried like this: str_extract(a, "(?=/).*(?=.nc)")
, but failed.
Question
How can I get the text between the last lash and .nc
in character vector a
.
Reproducible code
a = c(
'data/temp/air/pm2.5/pm2.5_year_2014.nc',
'data/temp/air/pm10/pm10_year_2014.nc',
'efcv/asdfe/weewr/rtrkhh/ss_fef_10233_dfdfe.nc'
)
# My solution (failed)
str_extract(a, "(?=/).*(?=.nc)")
# [1] "/temp/air/pm2.5/pm2.5_year_2014"
# [2] "/temp/air/pm10/pm10_year_2014"
# [3] "/asdfe/weewr/rtrkhh/ss_fef_10233_dfdfe"
# The expected output should like this:
# [1] "pm2.5_year_2014"
# [2] "pm10_year_2014"
# [3] "ss_fef_10233_dfdfe"
CodePudding user response:
Here is a regex replacement approach:
a = c(
'data/temp/air/pm2.5/pm2.5_year_2014.nc',
'data/temp/air/pm10/pm10_year_2014.nc',
'efcv/asdfe/weewr/rtrkhh/ss_fef_10233_dfdfe.nc'
)
output <- gsub(".*/|\\.[^.] $", "", a)
output
[1] "pm2.5_year_2014" "pm10_year_2014" "ss_fef_10233_dfdfe"
Here is the regex logic:
.*/
match everything from the start of the string until the last /|
OR\.[^.] $
match everything from final dot until the end of the string
Then we replace these matches by empty string to remove them, leaving behind the filenames.