Home > front end >  How get google.com web page using C socket
How get google.com web page using C socket

Time:09-10

I wrote code that should query the google.com web page and display its contents, but it doesn't work as intended.

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>

int main()
{
    int sockfd;
    struct sockaddr_in destAddr;

    if((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1){
        fprintf(stderr, "Error opening client socket\n");
        close(sockfd);
        return;
    }

    destAddr.sin_family = PF_INET;
    destAddr.sin_port = htons(80);
    destAddr.sin_addr.s_addr = inet_addr("64.233.164.94");
    memset(&(destAddr.sin_zero), 0, 8);

    if(connect(sockfd, (struct sockaddr *)&destAddr, sizeof(struct sockaddr)) == -1){
        fprintf(stderr, "Error with client connecting to server\n");
        close(sockfd);
        return;
    }

    char *httprequest1 = "GET / HTTP/1.1\r\n"
        "Host: google.com\r\n"
        "\r\n";

    char *httprequest2 = "GET / HTTP/1.1\r\n"
        "Host: http://www.google.com/\r\n"
        "\r\n";

    char *httprequest3 = "GET / HTTP/1.1\r\n"
        "Host: http://www.google.com/\r\n"
        "Upgrade-Insecure-Requests: 1\r\n"
        "Accept: text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\n"
        "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36\r\n"
        "\r\n";

    char *httprequest = httprequest2;
   
    printf("start send\n");
    int send_result = send(sockfd, httprequest, strlen(httprequest), 0);
    printf("send_result: %d\n", send_result);

    #define bufsize 1000
    char buf[bufsize   1] = {0};

    printf("start recv\n");
    int bytes_readed = recv(sockfd, buf, bufsize, 0);
    printf("end recv: readed %d bytes\n", bytes_readed);

    buf[bufsize] = '\0';
    printf("-- buf:\n");
    puts(buf);
    printf("--\n");


    return 0;
}

If I send httprequest1, I get this output:

gcc -w -o get-google get-google.c
./get-google
start send
send_result: 36
start recv
end recv: readed 528 bytes
-- buf:
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Fri, 09 Sep 2022 11:52:16 GMT
Expires: Sun, 09 Oct 2022 11:52:16 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

--

In httprequest2, I specified the parameter Host: and I got the following this output:

gcc -w -o get-google get-google.c
./get-google
start send
send_result: 48
start recv
end recv: readed 198 bytes
-- buf:
HTTP/1.1 400 Bad Request
Content-Length: 54
Content-Type: text/html; charset=UTF-8
Date: Fri, 09 Sep 2022 11:53:19 GMT
Connection: close

<html><title>Error 400 (Bad Request)!!1</title></html>
--

Then I try copy headers from browser and after httprequest3 I got same result as for httprequest2.

How can I get the full page?

CodePudding user response:

It should be Host: www.google.com and not Host: http://www.google.com/

However, it might not give you the home page. Google wants you to use HTTPS, so it'll probably redirect you to https://www.google.com/ and you won't be able to implement HTTPS fully yourself (you'll have to use a library like OpenSSL)

  • Related