I am trying to run a script that will search Healthline with a query string and determine if there are any search results, but I can't get the contents with the query string posting to the page. To search for something on their site, you go to https://www.healthline.com/search?q1=search string
.
Here is what I tried:
$healthline_url = 'https://www.healthline.com/search';
$search_string = 'ashwaganda';
$postdata = http_build_query(
array(
'q1' => $search_string
)
);
$opts = array('http' =>
array(
'method' => 'POST',
'header' => 'Content-type: application/x-www-form-urlencoded',
'content' => $postdata
)
);
$stream = stream_context_create($opts);
$theHtmlToParse = file_get_contents($healthline_url, false, $stream);
print_r($theHtmlToParse);
I also tried to just add the query string to the url and skip the stream, amongst other variations, but I'm running out of ideas. This also didn't work:
$healthline_url = 'https://www.healthline.com/search';
$search_string = 'ashwaganda';
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Content-Type: text/xml; charset=utf-8"
)
);
$stream = stream_context_create($opts);
$theHtmlToParse = file_get_contents($healthline_url.'&q1='.$search_string, false, $stream);
print_r($theHtmlToParse);
And suggestions?
EDIT: I changed the url in case someone wants to look at the search page. Also fixed the query string. Still doesn't work.
In response to Ken Lee, I did try the following cURL script that also just returns the page without search results:
$healthline_url = 'https://www.healthline.com/search?q1=ashwaganda';
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $healthline_url);
$data = curl_exec($ch);
curl_close($ch);
print_r($data);
CodePudding user response:
Healthline does not load the search result directly. It has its search index stored in Algolia and made extra javascript calls to retrieve the result. Therefore you cannot see the search result by file_get_content
.
To see the search result, you need to run a browser simulator that simulates a javascript-capable browser to properly run the site page.
For PHP developers, you may try using php-webdriver to control browers through webdriver (e.g. Selenium, Chrome chromedriver, Firefox geckodriver).
Update: Didn't know that the target site is Healthline. Updated the answer once I found out.