I am trying to make a program that takes a string and a delimiter,
and breaks the string into a series of tokens using the delimiter.
And finally store each token into an multi dimensional array.
the code:
char** get_tokens(const char* str, char delim) {
int i=4;
char *ar[i];
const char* delim2 = &delim;//to put this as a parameter in strtok()
char strcopy[50];
strcpy(strcopy,str);
char* token;
token = strtok(strcopy,delim2);//break str into pieces by ' 'sign
int k;
for (k=0;k<i;k ){
ar[k] = token;
token = strtok(NULL,delim2);
}
int n;
for (n=0;n<i;n )
printf("ar[%d] is %s\n",n,ar[n]);
return ar;
}
int main(){
char** tokens = get_tokens(" All Along the Watchtower ", ' ');
for (int k =0;k<4;k ){
printf("tokens[%d] is this %s\n",k,tokens[k]);
}
return 0;
}
the function strtok() is working properly as the output is
ar[0] is All
ar[1] is Along
ar[2] is the
ar[3] is Watchtower
but in the main function, I want the array tokens to get exact same result, but the output is
tokens[0] is this All
tokens[1] is this (null)
tokens[2] is this
tokens[3] is this (null)
so I guess it is not returning ar properly as after index 0 its returning null.
Also, I am getting a warning saying:
warning: address of stack memory associated with local variable 'ar' returned [-Wreturn-stack-address]
return ar;
^~
1 warning generated.
Do you know why this is like this?
the whole output is
ar[0] is All
ar[1] is Along
ar[2] is the
ar[3] is Watchtower
tokens[0] is this All
tokens[1] is this (null)
tokens[2] is this
tokens[3] is this (null)
CodePudding user response:
There are more than one fundamental problem with your code, alas.
- You are endeavoring to return a VLA. That doesn’t work; don’t do it.
- You are not null-terminating your delimiter string.
- Your function cannot self-determine the number of tokens.
However, I thought this to be a fun programming exercise and rolled up a generalized solution. Here is the header with documentation and totally optional default argument macro magic (thanks to Braden Steffaniak’s excellent macro mojo here):
split.h
// Copyright 2021 Michael Thomas Greer.
// Distributed under the Boost Software License, Version 1.0.
// (See accompanying file LICENSE_1_0.txt or copy at
// https://www.boost.org/LICENSE_1_0.txt )
/*
char **
split(
const char * s,
const char * sep = NULL, // --> whitespace: " \f\n\r\v\t"
bool is_dup_s = true, // --> non-destructive of source?
int granularity = 0 // --> default granularity
);
Function:
Split a string into tokens, much like strtok(). Tokens are delimited
by the argument separator characters. Empty tokens are not returned.
Returns:
• a NULL-terminated array of pointers to the tokens in s.
You must free() the resulting array. Do NOT free individual tokens!
• NULL on failure (due to a memory re/allocation failure).
Arguments:
s • The source string to tokenize.
sep • Separator characters. Defaults to all whitespace.
is_dup_s • By default the source string is duplicated so that
the tokenization can be done non-destructively (for
example, on literals). If you don't care about the
source, or the source is sufficiently large that
duplication could be a problem, then turn this off.
granularity • The algorithm works by building a table of token
indices. This is the growth size of that table.
It defaults to a reasonably small size. But if you
have a good idea of the number of tokens you will
typically generate, set it to that.
Uses totally-optional macro magic for elided default arguments.
No macros == no elided default argument magic. (You can still specify
default values for arguments, though.)
*/
#ifndef DUTHOMHAS_SPLIT_H
#define DUTHOMHAS_SPLIT_H
#include <stdbool.h>
char ** split( const char * s, const char * sep, bool is_dup_s, int granularity );
// https://stackoverflow.com/a/24028231/2706707
#define SPLIT_GLUE(x, y) x y
#define SPLIT_RETURN_ARG_COUNT(_1_, _2_, _3_, _4_, count, ...) count
#define SPLIT_EXPAND_ARGS(args) SPLIT_RETURN_ARG_COUNT args
#define SPLIT_COUNT_ARGS_MAX5(...) SPLIT_EXPAND_ARGS((__VA_ARGS__, 4, 3, 2, 1, 0))
#define SPLIT_OVERLOAD_MACRO2(name, count) name##count
#define SPLIT_OVERLOAD_MACRO1(name, count) SPLIT_OVERLOAD_MACRO2(name, count)
#define SPLIT_OVERLOAD_MACRO(name, count) SPLIT_OVERLOAD_MACRO1(name, count)
#define SPLIT_CALL_OVERLOAD(name, ...) SPLIT_GLUE(SPLIT_OVERLOAD_MACRO(name, SPLIT_COUNT_ARGS_MAX5(__VA_ARGS__)), (__VA_ARGS__))
#define split(...) SPLIT_CALL_OVERLOAD( SPLIT, __VA_ARGS__ )
#define SPLIT1(s) (split)( s, NULL, true, 0 )
#define SPLIT2(s,sep) (split)( s, sep, true, 0 )
#define SPLIT3(s,sep,ids) (split)( s, sep, ids, 0 )
#define SPLIT4(s,sep,ids,g) (split)( s, sep, ids, g )
#endif
And here is the important bit:
split.c
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
char ** split( const char * s, const char * sep, bool is_dup_s, int granularity )
{
char ** result;
typedef size_t slot[ 2 ];
int max_slots = (granularity > 0) ? granularity : 32;
int num_slots = 0;
size_t index = 0;
slot * slots = (slot *)malloc( sizeof(slot) * max_slots );
if (!slots) return NULL;
if (!sep) sep = " \f\n\r\v\t";
// Find all tokens
while (s[ index ])
{
index = strspn( s index, sep ); // skip any leading separators --> beginning of next token
if (!s[ index ]) break; // no more tokens
if (num_slots == max_slots) // assert: slots available
{
slot * new_slots = (slot *)realloc( slots, sizeof(slot) * (max_slots = granularity) );
if (!new_slots) { free( slots ); return NULL; }
slots = new_slots;
}
slots[ num_slots ][ 0 ] = index; // beginning of token
slots[ num_slots ][ 1 ] = index = strcspn( s index, sep ); // skip non-separators --> end of token
}
// Allocate and build the string array
result = (char **)malloc( sizeof(char *) * num_slots (is_dup_s ? index 1 : 0) );
if (result)
{
char * d = is_dup_s ? (char *)(&result[ num_slots ]) : (char *)s;
if (is_dup_s) memcpy( d, s, index 1 );
result[--num_slots ] = NULL;
while (num_slots --> 0)
{
result[ num_slots ] = d slots[ num_slots ][ 0 ];
d[ slots[ num_slots ][ 1 ] ] = '\0';
}
}
free( slots );
return result;
}
And here is some example code using it:
a.c
#include <stdio.h>
#include "split.h"
void test( const char * s, char ** ss )
{
printf( "%s\n", s );
for (int n = 0; ss[n]; n)
printf( " %d: \"%s\"\n", n, ss[n] );
free( ss );
printf( "\n" );
}
#define TEST(x) test( #x , x )
int main()
{
TEST( split( "Hello world! \n" ) );
TEST( split( " 2, 3, 5, 7, 11, ", /*sep*/", " ) );
TEST( split( "::::", ":" ) );
TEST( split( "", ":" ) );
TEST( split( "", NULL, true, 15 ) );
TEST( split( "a b c d e", NULL ) );
TEST( split( " - a---b c - d - ", " -", true, 1 ) );
char s[] = "Never trust a computer you can't throw out a window. --Abraham Lincoln";
printf( "s = \"%s\"\n", s );
TEST( split( s, " -.", false ) );
printf( "Modified s will print only the first token: \"%s\"\n", s );
}
Tested on Windows 10 using
- MSVC 2019 (19.21.27702.2)
cl /EHsc /W4 /Ox a.c split.c
- LLVM/Clang 9.0.0
clang -Wall -Wextra -pedantic-errors -O3 -o a.exe a.c split.c
and on Ubuntu 20.04 using
- GCC 9.3.0
gcc -Wall -Wextra -pedantic-errors -O3 a.c split.c
- Clang 10.0.0
clang -Wall -Wextra -pedantic-errors -O3 a.c split.c
Explain this madness!
I realize you are a beginner and this is quite a bit more than you may have expected. Don’t worry, playing with strings and dynamically-allocated memory is actually rather difficult. Lots of people get it wrong all the time.
The trick used here was to build a temporary list of indices into the string for the beginning and end of each token, using the strspn() and strcspn() library functions — the very same functions strtok() uses internally. The list can grow dynamically as needed.
Once that list is complete, we allocate enough memory to store pointers for every token 1 (for the NULL pointer at the end of the array), optionally followed by a copy of the source string.
Then we simply compute the pointer values (addresses) of the tokens indexed in the string, modifying the string just like strtok() does to null-terminate each token.
The result is a single block of memory so it can be passed directly to free() when the user is done iterating over the array. The example test function iterated over the array using an integer index, but a string iterator (pointer to char pointer) will do also:
char ** tokens = split( my_string, my_delimiters ); // Get tokens
for (char ** ptoken = tokens; *ptoken; ptoken) // For each token
printf( " %s\n", *ptoken ); // (do something with it)
free( tokens ); // Free tokens
And that’s it!