Home > Mobile >  How to return multi dimension array properly after using strtok()
How to return multi dimension array properly after using strtok()

Time:10-11

I am trying to make a program that takes a string and a delimiter,

and breaks the string into a series of tokens using the delimiter.

And finally store each token into an multi dimensional array.

the code:

char** get_tokens(const char* str, char delim) {
  int i=4;
  
  char *ar[i];

  const char* delim2 = &delim;//to put this as a parameter in strtok()
  char strcopy[50];
  strcpy(strcopy,str);
  char* token;
  token = strtok(strcopy,delim2);//break str into pieces by ' 'sign

  int k;
  for (k=0;k<i;k  ){
    ar[k] = token;

    token = strtok(NULL,delim2);

  }

  int n;
   for (n=0;n<i;n  )
     printf("ar[%d] is %s\n",n,ar[n]);
  
  return ar;
     
}

int main(){
  
    char** tokens = get_tokens("  All Along the Watchtower  ", ' ');

    for (int k =0;k<4;k  ){
      printf("tokens[%d] is this %s\n",k,tokens[k]);
    }

  return 0;
}

the function strtok() is working properly as the output is

ar[0] is All
ar[1] is Along
ar[2] is the
ar[3] is Watchtower

but in the main function, I want the array tokens to get exact same result, but the output is

tokens[0] is this All
tokens[1] is this (null)
tokens[2] is this 
tokens[3] is this (null)

so I guess it is not returning ar properly as after index 0 its returning null.

Also, I am getting a warning saying:

warning: address of stack memory associated with local variable 'ar' returned [-Wreturn-stack-address]
  return ar;
         ^~
1 warning generated.

Do you know why this is like this?

the whole output is

ar[0] is All
ar[1] is Along
ar[2] is the
ar[3] is Watchtower
tokens[0] is this All
tokens[1] is this (null)
tokens[2] is this 
tokens[3] is this (null)

CodePudding user response:

There are more than one fundamental problem with your code, alas.

  • You are endeavoring to return a VLA. That doesn’t work; don’t do it.
  • You are not null-terminating your delimiter string.
  • Your function cannot self-determine the number of tokens.

However, I thought this to be a fun programming exercise and rolled up a generalized solution. Here is the header with documentation and totally optional default argument macro magic (thanks to Braden Steffaniak’s excellent macro mojo here):

split.h

// Copyright 2021 Michael Thomas Greer.
// Distributed under the Boost Software License, Version 1.0.
// (See accompanying file LICENSE_1_0.txt or copy at
//  https://www.boost.org/LICENSE_1_0.txt )

/*

  char **
  split(
    const char * s,
    const char * sep         = NULL,  // --> whitespace: " \f\n\r\v\t"
    bool         is_dup_s    = true,  // --> non-destructive of source?
    int          granularity = 0      // --> default granularity
  );

  Function:
    Split a string into tokens, much like strtok(). Tokens are delimited
    by the argument separator characters. Empty tokens are not returned.

  Returns:
    • a NULL-terminated array of pointers to the tokens in s.
      You must free() the resulting array. Do NOT free individual tokens!
    • NULL on failure (due to a memory re/allocation failure).

  Arguments:
    s           • The source string to tokenize.
    sep         • Separator characters. Defaults to all whitespace.
    is_dup_s    • By default the source string is duplicated so that
                  the tokenization can be done non-destructively (for
                  example, on literals). If you don't care about the
                  source, or the source is sufficiently large that
                  duplication could be a problem, then turn this off.
    granularity • The algorithm works by building a table of token
                  indices. This is the growth size of that table.
                  It defaults to a reasonably small size. But if you
                  have a good idea of the number of tokens you will
                  typically generate, set it to that.

  Uses totally-optional macro magic for elided default arguments.
  No macros == no elided default argument magic. (You can still specify
  default values for arguments, though.)
*/

#ifndef DUTHOMHAS_SPLIT_H
#define DUTHOMHAS_SPLIT_H

#include <stdbool.h>

char ** split( const char * s, const char * sep, bool is_dup_s, int granularity );

// https://stackoverflow.com/a/24028231/2706707
#define SPLIT_GLUE(x, y) x y

#define SPLIT_RETURN_ARG_COUNT(_1_, _2_, _3_, _4_, count, ...) count
#define SPLIT_EXPAND_ARGS(args) SPLIT_RETURN_ARG_COUNT args
#define SPLIT_COUNT_ARGS_MAX5(...) SPLIT_EXPAND_ARGS((__VA_ARGS__, 4, 3, 2, 1, 0))

#define SPLIT_OVERLOAD_MACRO2(name, count) name##count
#define SPLIT_OVERLOAD_MACRO1(name, count) SPLIT_OVERLOAD_MACRO2(name, count)
#define SPLIT_OVERLOAD_MACRO(name, count) SPLIT_OVERLOAD_MACRO1(name, count)

#define SPLIT_CALL_OVERLOAD(name, ...) SPLIT_GLUE(SPLIT_OVERLOAD_MACRO(name, SPLIT_COUNT_ARGS_MAX5(__VA_ARGS__)), (__VA_ARGS__))

#define split(...) SPLIT_CALL_OVERLOAD( SPLIT, __VA_ARGS__ )
#define SPLIT1(s)           (split)( s, NULL, true, 0 )
#define SPLIT2(s,sep)       (split)( s, sep,  true, 0 )
#define SPLIT3(s,sep,ids)   (split)( s, sep,  ids,  0 )
#define SPLIT4(s,sep,ids,g) (split)( s, sep,  ids,  g )

#endif

And here is the important bit:

split.c

#include <stdbool.h>
#include <stdlib.h>
#include <string.h>

char ** split( const char * s, const char * sep, bool is_dup_s, int granularity )
{
  char **  result;
  typedef size_t slot[ 2 ];
  int      max_slots  = (granularity > 0) ? granularity : 32;
  int      num_slots  = 0;
  size_t   index      = 0;
  slot   * slots      = (slot *)malloc( sizeof(slot) * max_slots );

  if (!slots) return NULL;
  if (!sep) sep = " \f\n\r\v\t";

  // Find all tokens
  while (s[ index ])
  {
    index  = strspn( s   index, sep );  // skip any leading separators --> beginning of next token
    if (!s[ index ]) break;             // no more tokens

    if (num_slots == max_slots)  // assert: slots available
    {
      slot * new_slots = (slot *)realloc( slots, sizeof(slot) * (max_slots  = granularity) );
      if (!new_slots) { free( slots ); return NULL; }
      slots = new_slots;
    }

    slots[ num_slots   ][ 0 ] = index;                               // beginning of token
    slots[ num_slots   ][ 1 ] = index  = strcspn( s   index, sep );  // skip non-separators --> end of token
  }

  // Allocate and build the string array
  result = (char **)malloc( sizeof(char *) *   num_slots   (is_dup_s ? index   1 : 0) );
  if (result)
  {
    char * d = is_dup_s ? (char *)(&result[ num_slots ]) : (char *)s;
    if (is_dup_s) memcpy( d, s, index   1 );

    result[--num_slots ] = NULL;

    while (num_slots --> 0)
    {
      result[ num_slots ] = d   slots[ num_slots ][ 0 ];
      d[ slots[ num_slots ][ 1 ] ] = '\0';
    }
  }

  free( slots );
  return result;
}

And here is some example code using it:

a.c

#include <stdio.h>
#include "split.h"

void test( const char * s, char ** ss )
{
  printf( "%s\n", s );
  for (int n = 0;  ss[n];    n)
    printf( "  %d: \"%s\"\n", n, ss[n] );
  free( ss );
  printf( "\n" );
}

#define TEST(x) test( #x , x )

int main()
{
  TEST( split( "Hello world! \n" ) );
  TEST( split( " 2, 3, 5, 7, 11, ",  /*sep*/", " ) );
  TEST( split( "::::", ":" ) );
  TEST( split( "", ":" ) );
  TEST( split( "", NULL, true, 15 ) );
  TEST( split( "a b c d e", NULL ) );
  TEST( split( " - a---b   c - d - ", " -", true, 1 ) );

  char s[] = "Never trust a computer you can't throw out a window. --Abraham Lincoln";
  printf( "s = \"%s\"\n", s );
  TEST( split( s, " -.", false ) );
  printf( "Modified s will print only the first token: \"%s\"\n", s );
}

Tested on Windows 10 using

  • MSVC 2019 (19.21.27702.2) cl /EHsc /W4 /Ox a.c split.c
  • LLVM/Clang 9.0.0 clang -Wall -Wextra -pedantic-errors -O3 -o a.exe a.c split.c

and on Ubuntu 20.04 using

  • GCC 9.3.0 gcc -Wall -Wextra -pedantic-errors -O3 a.c split.c
  • Clang 10.0.0 clang -Wall -Wextra -pedantic-errors -O3 a.c split.c

Explain this madness!

I realize you are a beginner and this is quite a bit more than you may have expected. Don’t worry, playing with strings and dynamically-allocated memory is actually rather difficult. Lots of people get it wrong all the time.

The trick used here was to build a temporary list of indices into the string for the beginning and end of each token, using the strspn() and strcspn() library functions — the very same functions strtok() uses internally. The list can grow dynamically as needed.

Once that list is complete, we allocate enough memory to store pointers for every token 1 (for the NULL pointer at the end of the array), optionally followed by a copy of the source string.

Then we simply compute the pointer values (addresses) of the tokens indexed in the string, modifying the string just like strtok() does to null-terminate each token.

The result is a single block of memory so it can be passed directly to free() when the user is done iterating over the array. The example test function iterated over the array using an integer index, but a string iterator (pointer to char pointer) will do also:

char ** tokens = split( my_string, my_delimiters );  // Get tokens
for (char ** ptoken = tokens;  *ptoken;    ptoken)   // For each token
  printf( "  %s\n", *ptoken );                       //   (do something with it)
free( tokens );                                      // Free tokens

And that’s it!

  • Related