Home > Back-end >  IDHTTP read the website only get part of the page content
IDHTTP read the website only get part of the page content

Time:09-28


Test page: xinjiang always colour charts page
http://www.xjflcp.com/trend/analyseSSC.do? Operator=codeTrend& PageCount=100

The test environment:
Windows 7 + CBuilder6.0 + IDHTTP controls

Code:
AnsiString sURL="HTTP://http://www.xjflcp.com/trend/analyseSSC.do? Operator=codeTrend& PageCount=100 ";
Int iPageLen;

IdHTTP1 - & gt; Request - & gt; UserAgent="Mozilla/5.0 (Windows; U; Windows NT 5.1; Useful - CN; The rv: 1.9.2.13) Gecko/20101203 Firefox/3.6.13 ";
StrHtmlText=IdHTTP1 - & gt; Get (sURL. C_str ());
IPageLen=strHtmlText. Length ();

If (strHtmlText. Length () & gt; 0 & amp; & IdHTTP1 - & gt; ResponseCode==200)
{
//processing page code,,,,
}

Test results:
IPageLen for 40756, return only part of the page content


Third-party tools test results:
Use the curl and wget
Curl the -o xjsscraradata. HTML "http://www.xjflcp.com/trend/analyseSSC.do? Operator=codeTrend& PageCount=100
"Wget - c "http://www.xjflcp.com/trend/analyseSSC.do? Operator=codeTrend& PageCount=100
"
Read page results, still for 40756 in length, or the page is not complete

Continue to use the third-party software HTTP caught HttpAnalyzer analysis, found the page
The HTTP Request Header:
========================================================================
GET/trend/analyseSSC. Do? Operator=codeTrend& PageCount HTTP/1.1=100
Host: www.xjflcp.com
Connection: keep alive -
Cache-control: Max - age=0
Accept: text/HTML and application/XHTML + XML, application/XML. Q=0.9, image/webp, */*; Q=0.8
Upgrade - the Insecure - Requests: 1
The user-agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.99 Safari/537.36
The Accept - Encoding: gzip, deflate, SDCH
The Accept - Language: en - US, en; Q=0.8, useful - CN; Q=0.6, useful; Q=0.4
Cookie: JSESSIONID=E13A89BEBACE1EE3549E5F888EAB8C8F; __utma=62085349.1407844871.1443609161.1444145674.1444192727.19; __utmc=62085349; __utmz=62085349.1444139783.17.2. Utmcsr=baidu | utmccn=(organic) | utmcmd=organic; CNZZDATA873913=cnzz_eid % 3 - % 26 ntime d707946308-1443606074%3 d1444188798
========================================================================

The HTTP Response Header:
================================================
HTTP/1.1 200 OK
Server: nginx/1.0.14
Date: Wed, 07 Oct 2015 05:15:37 GMT
Content-Type: text/html; GBK charset=
Transfer - Encoding: chunked
Connection: keep alive -
Than: Accept - Encoding
The Set - cookies: e03576fd1f5f1f012ee454441a13d27 JSESSIONID=2; Path=/
The Content - Encoding: gzip
=================================================

Program debugging:
If the Request Header to join AcceptEncoding="gzip, deflate, SDCH";

IdHTTP1 - & gt; Request - & gt; UserAgent="Mozilla/5.0 (Windows; U; Windows NT 5.1; Useful - CN; The rv: 1.9.2.13) Gecko/20101203 Firefox/3.6.13 ";
IdHTTP1 - & gt; Request - & gt; AcceptEncoding="gzip, deflate, SDCH";

StrHtmlText=IdHTTP1 - & gt; Get (sURL. C_str ());
IPageLen=strHtmlText. Length ();

If (strHtmlText. Length () & gt; 0 & amp; & IdHTTP1 - & gt; ResponseCode==200)
{
//processing page code,,,,
}
Found that the return content gibberish, compression may be a page in the Indy side does not have the reason of decoding, decided to give up time method, continue to change method of debugging

In the Response Header found a Transfer to the page - Encoding: chunked
So think using HTTP 1.0 to shield chunked mode
IdHTTP1 - & gt; ProtocolVersion=pv1_0;
IdHTTP1 - & gt; Request - & gt; UserAgent="Mozilla/5.0 (Windows; U; Windows NT 5.1; Useful - CN; The rv: 1.9.2.13) Gecko/20101203 Firefox/3.6.13 ";

StrHtmlText=IdHTTP1 - & gt; Get (sURL. C_str ());
IPageLen=strHtmlText. Length ();

If (strHtmlText. Length () & gt; 0 & amp; & IdHTTP1 - & gt; ResponseCode==200)
{
//processing page code,,,,
}

Completely collapsed, the result is still the same, help!!!!!!



CodePudding user response:

The best help me ah, thank you
  • Related