Home > Back-end >  How to get original URL from the cURL output log entry?
How to get original URL from the cURL output log entry?

Time:09-29

I feed cURL with multiple URLs at a time, and have a difficulty parsing the output log to get the original addresses back. Namely, if an URL resolves, the output is as follows:

$ curl --head --verbose https://www.google.com/
*   Trying 64.233.165.106...
* TCP_NODELAY set
* Connected to www.google.com (64.233.165.106) port 443 (#0)
<...>
> HEAD / HTTP/2
> Host: www.google.com
<...>

which can eventually be parsed back to https://www.google.com/.

However, with an invalid URL it does not do:

$ curl --head --verbose --connect-timeout 3 https://imap.gmail.com/
*   Trying 74.125.131.109...
* TCP_NODELAY set
* After 1491ms connect time, move on!
* connect to 74.125.131.109 port 443 failed: Operation timed out
<...>
* Failed to connect to imap.gmail.com port 443: Operation timed out

The error message contains the URL in this case, but in other cases it does not. I can't rely on it.

So, I need either have URL-to-IP resolving disabled in the output, like

*   Trying https://imap.gmail.com/...

or somehow append each URL from the list to the corresponding output, like:

$ curl --head --verbose --connect-timeout 3 https://imap.gmail.com/ https://www.google.com/

https://imap.gmail.com/
*   Trying 64.233.162.108...
* TCP_NODELAY set
* After 1495ms connect time, move on!
* connect to 64.233.162.108 port 443 failed: Operation timed out
<...>

https://www.google.com/
*   Trying 74.125.131.17...
* TCP_NODELAY set
* Connected to www.gmail.com (74.125.131.17) port 443 (#0)
<...>

Wget or HTTPie are not an option. How one can achieve that with cURL?

CodePudding user response:

Perhaps this is the solution:

while read LINE ; do
    print "REQUESTED URL: $LINE" >> output.txt;
    curl $LINE >> output.txt 2>&1;
done < url-list.txt
  • Related