Home > Software engineering >  String in C unexpectedly truncated
String in C unexpectedly truncated

Time:06-26

I am working through this book, Hands-On Network Programming with C, and in the current chapter I'm reading we're building a web client. One of the functions of this web client is to parse the URL passed to it to determine the protocol, hostname, document path, etc. Part of the parsing function is below:

void parse_url(char *url, char **hostname, char **port, char **path){
    printf("URL: %s\n", url);

    char *p;
    p = strstr(url, "://");

    char *protocol = 0;
    if (p){
        protocol = url;
        printf("Protocol: %s\n", protocol);
        *p = 0;
        p  = 3;
    } else {
        p = url;
    }

    printf("Protocol: %s\n", protocol);
    if (protocol){
        printf("Protocol: %s\n", protocol);
        if (strcmp(protocol, "http")){
            fprintf(stderr, "Unknown protocol, '%s'. Only 'http' is supported.\n",
                protocol);
            exit(1);
        }
    }

Whenever I pass in a URL that isn't utilizing HTTP, such as https://example.com (URL they use in the book), I get the following output (I put the extra print statements in there for debugging purposes):

URL: https://example.com

Protocol: https://example.com

Protocol: https

Protocol: https

Unknown protocol, 'https'. Only 'http' is supported.

My question is, how does the protocol, which is pointing to the URL, get truncated to only the protocol rather than the whole URL as it was previously?

CodePudding user response:

The statement p = strstr(url, "://"); will find the first occurance of "://" in url and store the address of the first byte of "://" in p. So, *p would evaluate to ':'. If no "://" was found, p will be equal to NULL.

If "://" was found, protocol will be set to point to the beginning of the url, then '\0' is placed at the address where p is pointing at. So, if url contained "https://www.example.com\0" before, now url contains "https\0//www.example.com\0" (including '\0' at the end).

Strings in C are terminated by '\0'. So, any function processing the string "https\0//www.example.com\0" would stop processing the string at the first occurance of '\0'. Therefore, printf("%s", protocol) would print "https", strlen(p) would return 5, etc.

CodePudding user response:

Presented as an alternative to replacing the ':' in the URL with null'\0'. You can take that same information and use pointer math to calculate how many characters are in the protocol portion of the URL, then use that to [strncpy()](https://en.cppreference.com/w/c/string/byte/strncpy) only that many characters into another buffer. We'll initialize that buffer to zero so that the string is null-terminated after calling strncpy`.

This approach is non-destructive of the original URL string.

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

char *get_protocol(char *url);

int main() {
    char url[] = "https://www.example.com";
    char *protocol = get_protocol(url);

    printf("%s\n", protocol);

    free(protocol);

    return 0;
}

char *get_protocol(char *url) {
    char *p = strstr(url, "://");
    
    if (!p) return NULL;

    size_t len = p - url;
    char *result = calloc(len   1, 1);

    if (!result) return NULL;

    strncpy(result, url, len);
    
    return result;
}

Result: https

  • Related