I wrote code that should query the google.com web page and display its contents, but it doesn't work as intended.
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
int main()
{
int sockfd;
struct sockaddr_in destAddr;
if((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1){
fprintf(stderr, "Error opening client socket\n");
close(sockfd);
return;
}
destAddr.sin_family = PF_INET;
destAddr.sin_port = htons(80);
destAddr.sin_addr.s_addr = inet_addr("64.233.164.94");
memset(&(destAddr.sin_zero), 0, 8);
if(connect(sockfd, (struct sockaddr *)&destAddr, sizeof(struct sockaddr)) == -1){
fprintf(stderr, "Error with client connecting to server\n");
close(sockfd);
return;
}
char *httprequest1 = "GET / HTTP/1.1\r\n"
"Host: google.com\r\n"
"\r\n";
char *httprequest2 = "GET / HTTP/1.1\r\n"
"Host: http://www.google.com/\r\n"
"\r\n";
char *httprequest3 = "GET / HTTP/1.1\r\n"
"Host: http://www.google.com/\r\n"
"Upgrade-Insecure-Requests: 1\r\n"
"Accept: text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\n"
"User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36\r\n"
"\r\n";
char *httprequest = httprequest2;
printf("start send\n");
int send_result = send(sockfd, httprequest, strlen(httprequest), 0);
printf("send_result: %d\n", send_result);
#define bufsize 1000
char buf[bufsize 1] = {0};
printf("start recv\n");
int bytes_readed = recv(sockfd, buf, bufsize, 0);
printf("end recv: readed %d bytes\n", bytes_readed);
buf[bufsize] = '\0';
printf("-- buf:\n");
puts(buf);
printf("--\n");
return 0;
}
If I send httprequest1
, I get this output:
gcc -w -o get-google get-google.c
./get-google
start send
send_result: 36
start recv
end recv: readed 528 bytes
-- buf:
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Fri, 09 Sep 2022 11:52:16 GMT
Expires: Sun, 09 Oct 2022 11:52:16 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
--
In httprequest2
, I specified the parameter Host:
and I got the following this output:
gcc -w -o get-google get-google.c
./get-google
start send
send_result: 48
start recv
end recv: readed 198 bytes
-- buf:
HTTP/1.1 400 Bad Request
Content-Length: 54
Content-Type: text/html; charset=UTF-8
Date: Fri, 09 Sep 2022 11:53:19 GMT
Connection: close
<html><title>Error 400 (Bad Request)!!1</title></html>
--
Then I try copy headers from browser and after httprequest3
I got same result as for httprequest2
.
How can I get the full page?
CodePudding user response:
It should be Host: www.google.com
and not Host: http://www.google.com/
However, it might not give you the home page. Google wants you to use HTTPS, so it'll probably redirect you to https://www.google.com/
and you won't be able to implement HTTPS fully yourself (you'll have to use a library like OpenSSL)