I have a column of urls where I want to extract the last element of each url which represent and ID I am looking for.
I managed to use 'basename' to extract all the text after the last slash. Here is an example of the url that that I extracted
I want to extract that last number. I used this script but it seems that I am extract just the first one and copying it in other rows.
library(stringr)
library(dplyr)
df = read.csv('~/Downloads/urls.csv')
df = df %>%
mutate(temp = str_split(string = url,pattern = '-')) %>%
mutate(id = temp[[1]][length(temp[[1]])])
I used the code above and I am expecting to get an id variable with these values
CodePudding user response:
Assuming this column in a dataframe
# A tibble: 2 x 1
url
<chr>
1 essential-back-pain-stretches-3120312
2 what-is-myotome-296992
Extracting IDs with regex
df %>%
mutate(id = str_extract(url, pattern = "([^-] )$"))
# A tibble: 2 x 2
url id
<chr> <chr>
1 essential-back-pain-stretches-3120312 3120312
2 what-is-myotome-296992 296992