Home > Mobile >  Using tryCatch to replace urls and get final url from website in R
Using tryCatch to replace urls and get final url from website in R

Time:12-23

I have a dataframe with a column "URLs" that contains 23k website url redirects. I want to get the final url from these redirects and store them in a new column. However, some of the original urls are not valid anymore and lead to an error, so that I want to try the code with tryCatch. But since I am still a beginner in R, I am not sure how to correctly state this.

I used dput on my "URLs" column for the first couple of rows and edited one url in, that is incorrect:

c("https://icoholder.com/en/v2/ico/ico-redirect/4321?to=https://sirinlabs.com?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/1036136?to=https://dash2trade.com?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/1035284?to=https://impt.io?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/1030235?to=https://calvaria.io?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/1011041?to=https://artyfact.art?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/1031430?to=https://www.projectnexus.app?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/1005962?to=https://seedon.io?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/1033498?to=https://vicuna.network?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/1036409?to=https://cryptoffer.io/?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/23905?to=http://www.bitcoin.org/?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/1450?to=https://ethereum.org?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/17581?to=https://telegram.org?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/1009688?to=https://egoco.in/?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/19163?to=https://lapo.io?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/20971?to=https://ingotcoin.io?utm_source=icoholder", 
"https://icoholder.com/en/v2/ico/ico-redirect/26401?to=https://restotoken.org?utm_source=icoholder",
"https://icoholder.com/en/v2/ico/ico-redirect/4321?to=https://ccc"
)

and the code I am playing around with currently looks like this:

library(httr)

df$URLs <- tryCatch(sapply(df$URLs, function(x) GET(x)$url), error = function(e) return(NULL))

I have seen questions like this: How to write trycatch in R explaining how to use tryCatch, however, I am not sure how to adapt it to my specific case. Would be grateful for any tips and code adaptations!!!

CodePudding user response:

Instead of tryCatch(), I used possibly() that comes with purrr and pretty much does the same thing. If the function throws an error it will replace it with NA

library(tidyverse) 
library(httr)

df %>%
  mutate(final_url = map_chr(
    links,
    possibly( ~ .x %>% 
                GET() %>% 
                pluck("url"), 
              otherwise = NA_character_)
  ))

# A tibble: 17 x 2
   links                                                           final~1
   <chr>                                                           <chr>  
 1 https://icoholder.com/en/v2/ico/ico-redirect/4321?to=https:/~ https:~
 2 https://icoholder.com/en/v2/ico/ico-redirect/1036136?to=https%~ https:~
 3 https://icoholder.com/en/v2/ico/ico-redirect/1035284?to=https%~ https:~
 4 https://icoholder.com/en/v2/ico/ico-redirect/1030235?to=https%~ https:~
 5 https://icoholder.com/en/v2/ico/ico-redirect/1011041?to=https%~ https:~
 6 https://icoholder.com/en/v2/ico/ico-redirect/1031430?to=https%~ https:~
 7 https://icoholder.com/en/v2/ico/ico-redirect/1005962?to=https%~ https:~
 8 https://icoholder.com/en/v2/ico/ico-redirect/1033498?to=https%~ https:~
 9 https://icoholder.com/en/v2/ico/ico-redirect/1036409?to=https%~ https:~
10 https://icoholder.com/en/v2/ico/ico-redirect/23905?to=http:/~ https:~
11 https://icoholder.com/en/v2/ico/ico-redirect/1450?to=https:/~ https:~
12 https://icoholder.com/en/v2/ico/ico-redirect/17581?to=https:~ https:~
13 https://icoholder.com/en/v2/ico/ico-redirect/1009688?to=https%~ https:~
14 https://icoholder.com/en/v2/ico/ico-redirect/19163?to=https:~ https:~
15 https://icoholder.com/en/v2/ico/ico-redirect/20971?to=https:~ https:~
16 https://icoholder.com/en/v2/ico/ico-redirect/26401?to=https:~ http:/~
17 https://icoholder.com/en/v2/ico/ico-redirect/4321?to=https:/~ NA     
# ... with abbreviated variable name 1: final_url
  • Related