Home > Software design >  reduce the size of a string
reduce the size of a string

Time:12-19

(disclaimer: this is not a complete exercise because I have to finish it, but error occurred in this part of code)

I did this exercise to practice memory allocation.

create a function that takes an url (a C string) and returns the name of the website (with "www." and with the extension). for example, given wikipedia's link, "http://www.wikipedia.org/", it has to return only "www.wikipedia.org" in another string (dynamically allocated in the heap).

this is what I did so far: do a for-loop, and when "i" is greater than 6, then start copying each character in another string until "/" is reached. I need to allocate the other string, and then reallocate that.

here's my attempt so far:

char *read_website(const char *url) {
    char *str = malloc(sizeof(char)); 
    if (str == NULL) {
        exit(1); 
    }
    for (unsigned int i = 0; url[i] != "/" && i > 6;   i) {
        if (i <= 6) {
            continue; 
        }
        char* s = realloc(str, sizeof(char)   1); 
        if (s == NULL) {
            exit(1); 
        }
        *str = *s; 
    }
    return str; 
}

int main(void) {
    char s[] = "http://www.wikipedia.org/"; 
    char *str = read_website(s); 
    return 0; 
}

(1) by debugging line-by-line, I've noticed that the program ends once for-loop is reached. (solved) I've realized it's better to delete the if (i <= 6) and edit the for-loop starting point instead. now the for-loop starts with i = 7.

(2) another thing: I've chosen to create another pointer when I've used realloc, because I have to check if there's memory leak. Is it a good practice? Or should I've done something else?

EDIT: after deleting the if-check, I've seen that this realloc (has triggered a breakpoint during debugging).

CodePudding user response:

There are multiple problems in your code:

  • url[i] != "/" is incorrect, it is a type mismatch. You should compare the character url[i] with a character constant '/', not a string literal "/".

  • char *s = realloc(str, sizeof(char) 1); reallocates only to size 2, not the current length plus 1.

  • you do not increase the pointers, neither do you use the index variable.

  • instead of using malloc and realloc, you should first compute the length of the server name and allocate the array with the correct size directly.

Here is a modified version:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *read_website(const char *url) {
    // skip the protocol part
    if (!strncmp(url, "http://", 7))
        url  = 7;
    else if (!strncmp(url, "https://", 8))
        url  = 8;
    // compute the length of the domain name, stop at ':' or '/'
    size_t n = strcspn(url, "/:");
    // return an allocated copy of the substring
    return strndup(url, n);
}

int main(void) {
    char s[] = "http://www.wikipedia.org/"; 
    char *str = read_website(s);
    printf("%s -> %s\n", s, str);
    free(str);
    return 0; 
}

strndup() is a POSIX function available on many systems and that will be part of the next version of the C Standard. If it is not available on your target, here is a simple implementation:

char *strndup(const char *s, size_t n) {
    char *p;
    size_t i;
    for (i = 0; i < n && s[i]; i  )
        continue;
    p = malloc(i   1);
    if (p) {
        memcpy(p, s, i);
        p[i] = '\0';
    }
    return p;
}

CodePudding user response:

The assignment doesn't say the returned string must be of minimal size, and the amount of memory used for URLs is minimal.

Building on chqrlie's solution, I'd start by finding the beginning of the domain name (skipping the protocol portion), duplicate the rest of the string, and then truncate the result. Roughly:

char *prot[] = { "http://", "https://" };
for( int i=0; i < 2; i   ) {
  if( 0 == strncmp(url, http, strlen(prot)) ) 
     s  = strlen(http);
     break;
  }
}
char *output = strdup(s);
if( output ) {
  size_t n = strcspn(output, "/:");
  output[n] = '\0';
}
return output; 

The returned pointer can still be freed by the caller, so the total "wasted" space is limited to the trailing part of the truncated URL.

  • Related