Home > Back-end >  c Split char array without use of any library
c Split char array without use of any library

Time:01-27

I've been running into this weird issue where the split code returns correctly when I printf output inside the function, but will incorrectly return output upon calling it as an instance.

Question: How do I get the correct ouput when calling it as an instance?(see useage bellow)

Here is the code:

typedef struct SplitText
{
    int splitLen;
    char* splitTxt[100];
    char* subTxt(char* text, int index, int len)
    {
        char subTxt_[1000];
        int count = 0;
        for (int i = 0; i < 1000; i  )
            subTxt_[i] = '\0';
        for (int i = index; i < index   len; i  )
            subTxt_[count  ] = text[i];

        return subTxt_;
    }

    void split(char* text, char sep)
    {
        char separator[3] = { '<', sep, '>' };
        int textLen = strlen(text);
        int splitIndex = 0;
        int splitCount = 0;
        for (int t = 0; t < textLen; t  )
        {
            if (text[t] == separator[0] && text[t   1] == separator[1] && text[t   2] == separator[2])
            {
                if (splitIndex != 0)
                    splitIndex  = 3;
                splitTxt[splitCount] = subTxt(text, splitIndex, t - splitIndex);
                splitIndex = t;
                
                //correct output
                printf(splitTxt[splitCount]);
                printf("\n");

                splitCount  ;
            }
        }
        splitLen = splitCount;
    }
}SplitText;

Useage:

SplitText st;
st.split("testing<=>split<=>function<=>", '=');
for (int i = 0; i < st.splitLen; i  )
{
    //incorrect output
    printf(st.splitTxt[i]);
    printf("\n");
}
printf("--------\n");

CodePudding user response:

This:

    char* subTxt(char* text, int index, int len)
    {
        char subTxt_[1000];

        ...

        return subTxt_;
    }

Is undefined behavior. Returning a pointer to a local stack variable (or local array var) is going to result in weird stuff like this happening.

The typical thing that corrupts the contens of that returned pointer is when another function is invoked, the memory occupied by subTxt_ is going to get overwritten with the stack variables of the next function invoked.

Better:

    char* subTxt(char* text, int index, int len)
    {
        char *subTxt = new char[1000];

        ...

        return subTxt_;
    }

And then make sure whoever invokes subTxt remembers to delete [] on the returned pointer.

Or just use std::string and be done with it (unless this is an academic exercise).

Also, this is undefined behavior:

    for (int t = 0; t < textLen; t  )
    {
        if (text[t] == separator[0] && text[t   1] == separator[1] && text[t   2] == separator[2])

when t == textLen-1, then referencing text[t 2] and text[t 1] is an out of bounds access. Change it to be:

    for (int t = 2; t < textLen; t  )
    {
        if (text[t-2] == separator[0] && text[t -1] == separator[1] && text[t] == separator[2])

And do similar fixups with t within the block as well.

CodePudding user response:

Well you can create a splitstring function instead of a struct/class.

Anyway your code still looks quite "C" like with its fixed size char arrays. This will limit the usability and stability (out-of-bound array bugs).

Strings in C are usually of type std::string. and then C has string_view to make views on that string (so no data gets copied, but it also means your string_view is only valid for as long as the string it is viewing lives).

If you don't know the number of substrings in a string up-front, you should not use a fixed size array, but a std::vector (which can resize internally if needed)

This is what a split_string function would look like in current C , note that the code also shows better what it is doing compared to "C" style programming that show more what you are doing.

std::vector<std::string_view> split_string(std::string_view string, std::string_view delimiters)
{
    std::vector<std::string_view> substrings;
    if(delimiters.size() == 0ul)
    {
        substrings.emplace_back(string);
        return substrings;
    }

    auto start_pos = string.find_first_not_of(delimiters);
    auto end_pos = start_pos;
    auto max_length = string.length();

    while(start_pos < max_length)
    {
        end_pos = std::min(max_length, string.find_first_of(delimiters, start_pos));

        if(end_pos != start_pos)
        {
            substrings.emplace_back(&string[start_pos], end_pos - start_pos);
            start_pos = string.find_first_not_of(delimiters, end_pos);
        }
    }

    return substrings;
}

CodePudding user response:

Take a look at std::string_view. You can avoid allocating memory and it has a built-in substring function. Just be careful when using printf for printing to console as "%s" will print the whole string. See printf documentation.

for(auto view : container_with_string_views)
   printf("%.*s, (int)view.size(), view.data());
  • Related