Ok guys, I will try to make this question as simple as possible (I am struggling to discover the problem for few days now):
I have this function
function readContent($url, $sessionId)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
$cookiesFile = realpath(__DIR__.'/cookies/'.$sessionId.'.txt');
if(file_exists($cookiesFile))
{
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookiesFile);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiesFile);
}
//DEBUG------------------------------------------------------------------
$debugFile = fopen(__DIR__.'/debug/'.$sessionId.'.txt', 'w');
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_STDERR, $debugFile);
//-----------------------------------------------------------------------
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
try
{
$output = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if($httpCode == 200)
{
file_put_contents(__DIR__.'/contents/'.$sessionId.'.html', $output);
curl_close($ch);
preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $output, $match);
if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)
{
return readContent(str_replace('&', '&', $match[1][0]), $sessionId);
}
}
}
catch(Exception $e)
{
return false;
}
}
And it does 3 requests (logs from STDERR file):
GET /redacted/ HTTP/2
Host: redacted.coom
accept: */*
< HTTP/2 302
< location: https://redacted.com/location1/
* Added cookie redacted="redacted" for domain redacted.com, path /, expire 1634127679
< set-cookie: redacted=redacted; path=/; expires=Wed, 13 Oct 2021 12:21:19 GMT; SameSite=Lax
multiple set-cookie / added cookie follows...
So this first request, is done without any cookie, but it set multiple cookies in the COOKIEFILE/COOKIEJAR. Once CURL follows the location specified on request answer, it sends all cookies set:
* Ignoring the response-body
* Connection #0 to host redacted.com left intact
* Issue another request to this URL: 'https://redacted.com/location1/'
> GET /location1/ HTTP/2
Host: redacted.com
accept: */*
cookie: redacted=redacted;redacted=redacted;redacted=redacted......
< HTTP/2 200
This second request, returns 200 OK with META REFRESH in html body. As you see in the function, if it detects 200 with META REFRESH regex, the function self calls passing the URL on META REFRESH. And it goes to the 3rd request, but this time, it sends only 1 cookie
> GET /location1/?from_meta_refresh=1 HTTP/2
Host: redacted.com
accept: */*
cookie: redacted=redacted
This is an invalid request for this page, because it require another cookies that were set and are on CURLOPT_COOKIEFILE, but aren't sent. This only cookie sent, is the last line on the COOKIEFILE. I thought it may be some parameter on cookie like domain/folder/secure but I noted that have another cookie on the file, with the exact same parameters, and not sent on this request:
redacted.com FALSE / FALSE 1634128519 not_sent redacted
redacted.com FALSE / FALSE 1634128519 sent redacted
Please, help!
CodePudding user response:
For some unknown reason, PHP Curl doesn't save the cookies to COOKIEJAR file when you call curl_close(). So I needed to save the cookies before close, using:
curl_setopt($ch, CURLOPT_COOKIELIST, 'FLUSH');
After this, the cookies were available for the next request.