Home > database >  how to extract the last id from URL
how to extract the last id from URL


I have a column of urls where I want to extract the last element of each url which represent and ID I am looking for.

I managed to use 'basename' to extract all the text after the last slash. Here is an example of the url that that I extracted

enter image description here

I want to extract that last number. I used this script but it seems that I am extract just the first one and copying it in other rows.

df = read.csv('~/Downloads/urls.csv')

df = df %>%
  mutate(temp = str_split(string = url,pattern = '-')) %>%
  mutate(id = temp[[1]][length(temp[[1]])])

I used the code above and I am expecting to get an id variable with these values

enter image description here

CodePudding user response:

Assuming this column in a dataframe

# A tibble: 2 x 1
1 essential-back-pain-stretches-3120312
2 what-is-myotome-296992 

Extracting IDs with regex

df %>% 
  mutate(id = str_extract(url, pattern = "([^-] )$"))

# A tibble: 2 x 2
  url                                   id     
  <chr>                                 <chr>  
1 essential-back-pain-stretches-3120312 3120312
2 what-is-myotome-296992                296992 
  • Related