Peng Dabo 14:41:23 2020-08-18 117 collection
Classification column: experience sharing posts tags:
Java HTTPS experience sharing data miningCopyright
Currently webmagic version is the latest version of 0.7.3 in crawl only support TLS1.2 HTTPS site complains, the author says in 0.7.4 release bug fixes, but waited three years didn't wait until 0.7.4 release.
Javax.net.ssl.SSLHandshakeException: Received fatal alert: protocol_version
At sun. Security. SSL. Alert. CreateSSLException (131) Alert. Java:
At sun. Security. SSL. Alert. CreateSSLException (117) Alert. Java:
At sun. Security. SSL. TransportContext. Fatal (TransportContext. Java: 314)
At sun. Security. SSL. Alert $AlertConsumer. Consume (293) Alert. Java:
At sun. Security. SSL. TransportContext. Dispatch (TransportContext. Java: 187)
At sun. Security. SSL. SSLTransport. Decode (SSLTransport. Java: 154)
At sun. Security. SSL. SSLSocketImpl. Decode (SSLSocketImpl. Java: 1198)
At sun. Security. SSL. SSLSocketImpl. ReadHandshakeRecord (SSLSocketImpl. Java: 1107)
At sun. Security. SSL. SSLSocketImpl. StartHandshake (SSLSocketImpl. Java: 400)
At sun. Security. SSL. SSLSocketImpl. StartHandshake (SSLSocketImpl. Java: 372)
The at org. Apache. HTTP. Conn. SSL. SSLConnectionSocketFactory. CreateLayeredSocket (SSLConnectionSocketFactory. Java: 436)
The at org. Apache. HTTP. Conn. SSL. SSLConnectionSocketFactory. ConnectSocket (SSLConnectionSocketFactory. Java: 384)
The at org. Apache. HTTP. Impl. Conn. DefaultHttpClientConnectionOperator. Connect (DefaultHttpClientConnectionOperator. Java: 142)
The at org. Apache. HTTP. Impl. Conn. PoolingHttpClientConnectionManager. Connect (PoolingHttpClientConnectionManager. Java: 376)
The at org. Apache. HTTP. Impl. Execchain. MainClientExec. EstablishRoute (MainClientExec. Java: 393)
The at org. Apache. HTTP. Impl. Execchain. MainClientExec. Execute (MainClientExec. Java: 236)
The at org. Apache. HTTP. Impl. Execchain. ProtocolExec. Execute (ProtocolExec. Java: 186)
The at org. Apache. HTTP. Impl. Execchain. RetryExec. Execute (RetryExec. Java: 89)
The at org. Apache. HTTP. Impl. Execchain. RedirectExec. Execute (RedirectExec. Java: 110)
The at org. Apache. HTTP. Impl. Client. InternalHttpClient. The doExecute (InternalHttpClient. Java: 185)
The at org. Apache. HTTP. Impl. Client. CloseableHttpClient. Execute (CloseableHttpClient. Java: 83)
At us. Codecraft. Webmagic. Downloader. HttpClientDownloader. Download (HttpClientDownloader. Java: 85)
At us. Codecraft. Webmagic. Spiders. The processRequest (404) spiders. Java:
At us. Codecraft. Webmagic. Spiders. Access the $000 (61) spiders. Java:
At us. Codecraft. Webmagic. Spiders $1. The run (320) spiders. Java:
At us. Codecraft. Webmagic. Thread. CountableThreadPool $1. The run (74) CountableThreadPool. Java:
The at Java. Util. Concurrent. ThreadPoolExecutor. RunWorker (ThreadPoolExecutor. Java: 1149)
The at Java. Util. Concurrent. ThreadPoolExecutor $Worker. The run (ThreadPoolExecutor. Java: 624)
The at Java. Lang. Thread. The run (Thread. Java: 748)
Temporary way of adaptation, the solution is: change the buildSSLConnectionSocketFactory HttpClientGenerator method, rewrite their own implementation HttpClientDownloader, and set to the spiders, modify the content is as follows:
Return new SSLConnectionSocketFactory (createIgnoreVerifySSL (), the new String [] {" SSLv3 ", "TLSv1", "TLSv1.1", "TLSv1.2"},
Null,
New DefaultHostnameVerifier ())
Amending ok, now I have put the source code and compiled into a jar package, have a friend is the same problem can be directly download use.
Webmagic - core - 0.7.3. Jar
The extracted code: webm
Jar package method of use: will be downloaded jar package, in your maven repository of us \ codecraft \ webmagic - core \ 0.7.3 directory, replacing the original jar package can solve the problem.
CodePudding user response:
SSL can be solved