Home > Net >  C , how to remove char from a string
C , how to remove char from a string

Time:03-21

I have to remove some chars from a string, but I have some problems. I found this part of code online, but it does not work so well, it removes the chars but it even removes the white spaces

string messaggio = "{Questo e' un messaggio} ";
    char chars[] = {'Ì', '\x1','\"','{', '}',':'};
    for (unsigned int i = 0; i < strlen(chars);   i)
    {
        messaggio.erase(remove(messaggio.begin(), messaggio.end(), chars[i]), messaggio.end());
    }

Can someone tell me how this part of code works and why it even removes the white spaces?

CodePudding user response:

Because you use strlen on your chars array. This function stops ONLY when it encounters a \0, and you inserted none... So you're parsing memory after your array - which is bad, it should even provoke a SEGFAULT. Also, calling std::remove is enough.

A correction could be:

    char chars[] = {'I', '\x1','\"','{', '}',':'};
    for (unsigned int i = 0; i < sizeof(chars);   i)
    {
        std::remove(messaggio.begin(), messaggio.end(), chars[i]) ;
    }

CodePudding user response:

Answer for Wissblade is more or less correct, it just lacks of some technical details.

As mentioned strlen searches for terminating character: '\0'. Since chars do not contain such character, this code invokes "Undefined behavior" (buffer overflow). "Undefined behavior" - means anything can happen, code may work, may crash, may give invalid results.

So first step is to drop strlen and use different means to get size of the array.

There is also another problem. Your code uses none ASCII character: 'Ì'.

I assume that you are using Windows and Visual Studio. By default msvc compiler assumes that file is encoded using your system locale and uses same locale to generate exactable. Windows by default uses single byte encoding specific to your language (to be compatible with very old software). Only in such chase you code has chance to work. On platforms/configuration with mutibyte encoding, like UTF-8 this code can't work even after Wisblade fixes.

Wisblade fix can take this form (note I change order of loops, now iteration over characters to remove is internal loop):

bool isCharToRemove(char ch)
{
    constexpr char skipChars[] = {'Ì', '\x1','\"','{', '}',':'};
    return std::find(std::begin(skipChars), std::end(skipChars), ch) != std::end(skipChars);
}

std::string removeMagicChars(std::string message)
{
    message.erase(
        std::remove_if(message.begin(), message.end(), isCharToRemove),
        message.end());
    }
    return message;
}

Let me know if you need solution which can handle more complex text encoding.

  • Related