This is my college Assignment to Fetch a WebPage From Any Web server By URL Using TCP Socket And HTTP "GET" Request. And I am not Getting HTTP/1.0 200 OK Response From Any Server
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintStream;
import java.net.InetAddress;
import java.net.Socket;
import java.net.URL;
import java.util.Scanner;
import java.net.*;
public class DCCN042 {
public static void main(String[] args) {
Scanner inpt = new Scanner(System.in);
System.out.print("Enter URL: ");
String url = inpt.next();
TCPConnect(url);
}
public static void TCPConnect(String url) {
try {
String hostname = new URL(url).getHost();
System.out.println("Loading contents of Server: " hostname);
InetAddress ia = InetAddress.getByName(hostname);
String ip = ia.getHostAddress();
System.out.println(ip " is IP Adress for " hostname);
String path = new URL(url).getPath();
System.out.println("Requested Path on the server: " path);
Socket socket = new Socket(ip, 80);
// Create input and output streams to read from and write to the server
PrintStream out = new PrintStream(socket.getOutputStream());
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
// Follow the HTTP protocol of GET <path> HTTP/1.0 followed by an empty line
if (hostname ! = url) {
//Request Line
out.println("GET " path " HTTP/1.1");
out.println("Host: " hostname);
//Header Lines
out.println("User-Agent: Java/13.0.2");
out.println("Accept-Language: en-us");
out.println("Accept: */*");
out.println("Connection: keep-alive");
out.println("Accept-Encoding: gzip, deflate, br");
// Blank Line
out.println();
} else {
//Request Line
out.println("GET / HTTP/1.0");
out.println("Host: " hostname);
//Header Lines
out.println("User-Agent: Java/13.0.2");
out.println("Accept-Language: en-us");
out.println("Accept: */*");
out.println("Connection: keep-alive");
out.println("Accept-Encoding: gzip, deflate, br");
// Blank Line
out.println();
}
// Read data from the server until we finish reading the document
String line = in.readLine();
while (line != null) {
System.out.println(line);
line = in.readLine();
}
// Close our streams
in.close();
out.close();
socket.close();
} catch (Exception e) {
System.out.println("Invalid URl");
e.printStackTrace();
}
}
}
I Create TCP Socket And pass the IP Address that received from InetAddress Library Method getHostAddress() and port "80" for the web server and use getPath() and getHost() to separate path and hostname from URL and Use Same Path and hostname in HTTP GET request And Response from Server:
Enter URL: https://stackoverflow.com/questions/33015868/java-simple-http-get-request-using-tcp-sockets
Loading contents of Server: stackoverflow.com
151.101.65.69 is IP Adress for stackoverflow.com
Requested Path on the server: /questions/33015868/java-simple-http-get-request-using-tcp-sockets
HTTP/1.1 301 Moved Permanently
cache-control: no-cache, no-store, must-revalidate
location: https://stackoverflow.com/questions/33015868/java-simple-http-get-request-using-tcp-sockets
x-request-guid: 5f2af765-40c2-49ca-b9a1-daa321373682
feature-policy: microphone 'none'; speaker 'none'
content-security-policy: upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com
Accept-Ranges: bytes
Transfer-Encoding: chunked
Date: Mon, 27 Dec 2021 15:00:17 GMT
Via: 1.1 varnish
Connection: keep-alive
X-Served-By: cache-qpg1263-QPG
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1640617217.166650,VS0,VE338
Vary: Fastly-SSL
X-DNS-Prefetch-Control: off
Set-Cookie: prov=149aa0ef-a3a6-8001-17c1-128d6d4b7273; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
0
My Requirement is to get HTML Source code of this Webpage. And HTTP/1.0 200 OK Response
CodePudding user response:
This is happening because you are using a plain Socket
with a hardcoded port 80
. This means that, independently of using a http
or https
url in your input, you are requesting via the unsecure protocol http
.
In this situation, the server is telling you, as Samuel L. Jackson would say "hey mother fucker! you are trying to reach me through a fucking unsecure protocol, fucking HTTP. Use a secure one mother fucker, the fuck HTTPS.", and so, it responds with 301 (which just means "use this url, not the original one"), with the Location
header pointing to the correct URL, the https
one.
So apparently the 301
Location
is the same URL, but it's not, because in your code you are hardcoding http
, and the server response is redirecting to https
.
To make your code work with https
, instead of a plain Socket
use this:
SSLSocketFactory factory = (SSLSocketFactory)SSLSocketFactory.getDefault();
SSLSocket socket = (SSLSocket)factory.createSocket(ia, 443);
Do note that I'm not using the ip
, because for https
you need that the certificate corresponds to the domain, if you use the IP you will get a CertificateExpiredException
.
Now, whether to use Socket
or SSLSocket
is something that you will have to manage programatically depending on the user input.