Home > Back-end >  C Remove substring between two characters
C Remove substring between two characters

Time:01-11

I have two possible strings that would be pointed to by a char *:

char *s = "this is a string";
char *s = "this is a string [this is more string]";

I want to be able to remove the brackets and their contents from the char if they exist. What is the best way to do that in C?

Thanks!

CodePudding user response:

First, you must have access to memory you are permitted to modify. For hysterical raisins, C still allows you to declare pointers to constant character arrays as non-const... but you still can’t modify them.

Instead, declare yourself an array which is initialized by the constant data.

char s[] = "hello world";

That gives you a mutable array of 11 1 characters (a copy of the immutable source string).

Often, however, we want to be able to work on our strings, so we need to make sure there is enough room to play with them.

char s[100] = "hello world";  // string capacity == 99 1. In use == 11 1.
s[6] = '\0';              // now string contains "hello ". In use == 6 1.
strcat( s, "everyone" );  // now string contains "hello everyone". In use == 14 1.

To remove content from a string, you must first identify where the start and stop is:

char s[] = "this is a string [this is more string] original string";
char * first = strchr( s, '[' );  // find the '['
char * last  = strchr( s, ']' );  // find the ']'
if (first and last)  // if both were found...
  // ...copy the end of the string over the part to be replaced
  memmove( first, last 1, strlen(last 1) 1 );
// s now contains "this is a string  original string"

Notice all those 1s in there? Be mindful:

  • of where you are indexing the string (you want to copy from after the ']' to the end of the string)
  • that strings must end with a nul-terminator — a zero-valued character (which we should also copy, so we added one to the strlen).

Remember, a string is just an array of characters, where the last used character is immediately followed by a zero-value character (the nul-terminator). All the methods you use to handle arrays of any other kind apply to handling strings (AKA arrays of characters).

CodePudding user response:

This is one of those "fun" problems that attracts more than its share of answers.

Full credit to other answers for noting that attempts to modify a "string literal" triggers UB. (String literals are often stored in an immutable "read only" region of memory.) (Thanks to @chux-reinstatemonica for clarification.)

Taking this to the next level, below is a bit of code that handles both multiple regions bounded by a pair of delimiters ('[' & ']'), and handles nested instances, too. The code is simple enough, using cut as a counter that ensures nesting is accounted for. NB: It is presumed that the source string is "well formed".

#include <stdio.h>

int main( void ) {
    char *wb =
         "Once [upon] a [time] there [lived a] princess. "
         "She [ was [a hottie]]. The end.";
    char wo[100]; // big enough

    int cut = 0;
    size_t s = 0, d = 0; 

    for( ; wb[s]; s   ) {
        cut  = ( wb[s] == '[' ); // entering region to cut?
        if( !cut ) wo[ d   ] = wb[ s ]; // not cutting, so copy
        cut -= ( wb[s] == ']' ); // exiting region that was cut?
        if( cut < 0 ) cut = 0; // in case of spurious ']'
    }
    wo[ d ] = '\0'; // terminate shortened string

    puts( wb );
    puts( wo );

    return 0;
}
Once [upon] a [time] there [lived a] princess. She [ was [a hottie]]. The end.
Once  a  there  princess. She . The end.

It now becomes a small challenge to, perhaps, remove multiple consecutive SPs in the output array. This could quite easily be done on the fly, and is left as an exercise.

One begins to see how something like this could be extended to make an "infix calculator program". Always something new!

CodePudding user response:

Use the strchnul function defined in string.h

    *(strchrnul(s, '[')) = '\0';

You may need to define the _GNU_SOURCE macro. It won't work on string literals.

CodePudding user response:

String literals are read-only and should not be modified. Attempting to modify one leads to undefined behavior. They can be used to initialize char arrays, though.

char s[] = "this is a string [this is more string]";

If you want to remove the brackets and the characters between them, you need to first ascertain that the string contains both using strchr. Having stored those pointers, you can use pointer arithmetic with strncpy and strcat to create a new string minus the parts you want to exclude.

#include <stdio.h>
#include <string.h>

int main(void) {
    char s[] = "this is a string [this is more string] foo";
    char *start = NULL, *end = NULL;

    if ((start = strchr(s, '[')) == NULL 
        || (end = strchr(s, ']')) == NULL) {
        printf("Invalid string.\n");
        return 1;
    }

    char s2[strlen(s)   1];

    size_t start_length = start - s;

    strncpy(s2, s, start_length);
    s2[start_length] = '\0';
    strcat(s2, &end[1]);

    printf(">%s<\n", s2);

    return 0;
}

Output:

>this is a string  foo<

Or, we can simply modify the original string in place by copying from the end pointer 1 to the start pointer.

#include <stdio.h>
#include <string.h>

int main(void) {
    char s[] = "this is a string [this is more string]";
    //char s2[strlen(s)   1];

    char *start = NULL, *end = NULL;

    if ((start = strchr(s, '[')) == NULL 
        || (end = strchr(s, ']')) == NULL) {
        printf("Invalid string.\n");
        return 1;
    }

    strcpy(start, end 1);

    printf(">%s<\n", s);

    return 0;
}

A further improvement would be to only look for the closing character after the starting character.

#include <stdio.h>
#include <string.h>

int main(void) {
    char s[] = "this is a string [this is more string]";
    //char s2[strlen(s)   1];

    char *start = NULL, *end = NULL;

    if ((start = strchr(s, '[')) == NULL 
        || (end = strchr(start 1, ']')) == NULL) {
        printf("Invalid string.\n");
        return 1;
    }

    strcpy(start, end 1);

    printf(">%s<\n", s);

    return 0;
}
  • Related