I want a function to validate URL and I found librarian
is_valid_url <- function(string) {
any(grepl("(https?|ftp)://[^\\s/$.?#].[^\\s]*", string))
}
I tested it and everything works fine except URLs with domain names starting with s
return FALSE. Here are the results using R (4.2.0):
> any(grepl("(https?|ftp)://[^\\s/$.?#].[^\\s]*", "www.example.com"))
[1] FALSE
> any(grepl("(https?|ftp)://[^\\s/$.?#].[^\\s]*", "http://example.com"))
[1] TRUE
> any(grepl("(https?|ftp)://[^\\s/$.?#].[^\\s]*", "https://example.com"))
[1] TRUE
> any(grepl("(https?|ftp)://[^\\s/$.?#].[^\\s]*", "123"))
[1] FALSE
> any(grepl("(https?|ftp)://[^\\s/$.?#].[^\\s]*", "https://science.org"))
[1] FALSE
> any(grepl("(https?|ftp)://[^\\s/$.?#].[^\\s]*", "https://stackoverflow.com"))
[1] FALSE
Does anyone know how I can fix the regular expression to correctly validate URLs with domain name starting with s
?
Thank you.
(FYI. The regular expression used is "@stephenhay" from https://mathiasbynens.be/demo/url-regex)
CodePudding user response:
In regrex this expression [^...] means negation, so the patterns is looking for strings that after "https://" desn't contains the of the characters inside.
is_valid_url <- function(string) {
pattern <- "(https?|ftp)://[^ /$.?#].[^\\s]*"
stringr::str_detect(string, pattern)
}
url <- c(
"www.example.com",
"http://example.com",
"https://example.com",
"123",
"https://science.org",
"https://stackoverflow.com"
)
is_valid_url(url)
#> [1] FALSE TRUE TRUE FALSE TRUE TRUE
Created on 2022-10-04 by the reprex package (v2.0.1)