Home > Software design >  Looping over multiple URLs with a CSV database and regex
Looping over multiple URLs with a CSV database and regex

Time:12-13

I am trying to get a list of URLs from a parent database that are constructed of two different keys:

  1. The relevant key to the sub folder
  2. The pertinent key to the specific file

The sub folder key is held in a CSV file and so the concatenation is relatively simple:

csv <- read.csv("raw.csv")
vec <- csv$col1

URLs <- paste0("https://www.abc-corp.com/data/", vec, "/gotit/")

This returns a list of partial URLs but then it gets tricky.

The next section is a block of digits with 5 irrelevant ones, the year anotated in shorthand, then three more, e.g https://www.abc-corp.com/data/1234/gotit/1234522123"

I've tried:

URLs <- paste0("https://www.abc-corp.com/data/", vec, "/gotit/", "\d\d\d\d\d", "22", "\d\d\d")

and been told:

Error: '\d' is an unrecognized escape in character string starting ""\d\d"

It seems to be putting in an extra speech mark as well as not recognising the regex.

CodePudding user response:

You will need to add 2 backslashes for each individual slash. Additionally you may want to use quantifiers instead of repeating \d.


URLs <- paste0("https://www.abc-corp.com/data/", vec, "/gotit/", "\\d{5}", "22", "\\d{3}")

  • Related