I get the error: "error code: 1020". The page I'm trying to crawl for form data is: https://v2.gcchmc.org/medical-status-search/.
This is my code:
$initial = file_get_contents('https://v2.gcchmc.org/medical-status-search/'); $check = preg_replace('/. ?input type="hidden" name="csrfmiddlewaretoken" value="(. ?)".*/sim', '$1'. $initial); print $check;
"error code: 1020" the page I am trying to crawl for form data is https://v2.gcchmc.org/medical-status-search/. Can you help me what's wrong in the code below.
CodePudding user response:
The site is protected by cloudflare. You can bypass the cloudflare when you have javascript enabled, so through command line is not going to work. You can however automate this by using Puppeteer for example, which also is available in PHP. But you have to disable headless to make it work.
Installation
composer require nesk/puphpeteer
npm install @nesk/puphpeteer
The script (test.php)
use Nesk\Puphpeteer\Puppeteer;
require_once __DIR__ . "/vendor/autoload.php";
function getToken($content)
{
preg_match_all('/. ?input type="hidden" name="csrfmiddlewaretoken" value="(. ?)".*/sim', $content, $matches);
return $matches[1][0];
}
$puppeteer = new Puppeteer;
$browser = $puppeteer->launch(['headless'=>false]);
/**
* @var $page \Nesk\Puphpeteer\Resources\Page
*/
$page = $browser->newPage();
$page->goto('https://v2.gcchmc.org/medical-status-search/');
var_dump(getToken($page->content()));
$browser->close();
Now you probably don't need the csrfmiddlewaretoken when running the script like this, but you can take it further from here if you chose to use this feature.