Home > database >  Disable regular expression parsing in curl
Disable regular expression parsing in curl

Time:11-07

I am trying to curl an API that accepts a regular expression as one of the arguments.

I discovered that curl is trying to be smart here and is parsing the regular expression itself, rather than treating it as part of the URL. So every time I launch the curl command, it then proceeds to make 70,000 HTTP requests. It should only be making one.

Example: https://api.example.com/v1/?regex=(123[4-5][6-7][8-9])

In this case, I want the literal HTTP request to be, well, that.

Instead, I get:

[1/8]: https://api.example.com/v1/?regex=(123468) --> <stdout>
--_curl_--https://api.example.com/v1/?regex=(123468)
curl: (6) Could not resolve host: api.example.com

[2/8]: https://api.example.com/v1/?regex=(123469) --> <stdout>
--_curl_--https://api.example.com/v1/?regex=(123469)
curl: (6) Could not resolve host: api.example.com

[3/8]: https://api.example.com/v1/?regex=(123478) --> <stdout>
--_curl_--https://api.example.com/v1/?regex=(123478)
curl: (6) Could not resolve host: api.example.com

[4/8]: https://api.example.com/v1/?regex=(123479) --> <stdout>
--_curl_--https://api.example.com/v1/?regex=(123479)
curl: (6) Could not resolve host: api.example.com

[5/8]: https://api.example.com/v1/?regex=(123568) --> <stdout>
--_curl_--https://api.example.com/v1/?regex=(123568)
curl: (6) Could not resolve host: api.example.com

[6/8]: https://api.example.com/v1/?regex=(123569) --> <stdout>
--_curl_--https://api.example.com/v1/?regex=(123569)
curl: (6) Could not resolve host: api.example.com

[7/8]: https://api.example.com/v1/?regex=(123578) --> <stdout>
--_curl_--https://api.example.com/v1/?regex=(123578)
curl: (6) Could not resolve host: api.example.com

[8/8]: https://api.example.com/v1/?regex=(123579) --> <stdout>
--_curl_--https://api.example.com/v1/?regex=(123579)
curl: (6) Could not resolve host: api.example.com

How can I disable this regex parsing that curl has?

I checked curl --help all and there is nothing there about this kind of parsing. I also checked the man page for curl. There is nothing there, either; this seems to be a completely undocumented "helpful" feature.

It does seem to be curl, not the shell, because this does not seem to happen with wget, which makes only one (failing request).

Trying to escape the brackets with \ seems to prevent the parsing, but also transforms the characters in a way which doesn't look right:

root@voip:/tmp# wget "https://api.example.com/v1/?regex=(123\[4-5\]\[6-7\]\[8-9\])"
--2021-11-05 11:26:03--  https://api.example.com/v1/?regex=(123\[4-5\]\[6-7\]\[8-9\])
Resolving api.example.com (api.example.com)... failed: Name or service not known.
wget: unable to resolve host address ‘api.example.com’
root@voip:/tmp# wget "https://api.example.com/v1/?regex=(123\[4-5]\[6-7]\[8-9])"
--2021-11-05 11:26:25--  https://api.example.com/v1/?regex=(123\[4-5]\[6-7]\[8-9])
Resolving api.example.com (api.example.com)... failed: Name or service not known.
wget: unable to resolve host address ‘api.example.com’

CodePudding user response:

To disable URL sequences and ranges using {} and [], you can use option -g.

  • Related