I have a text file that contains multiple strings that are different lengths that I need to split into tokens.
Is it best to use strtok
to split these strings and how can I count the tokens?
Example of strings from the file
Emma Stone#1169876#COMP242#COMP333#COMP336#COMP133#COMP231
Emma Watson#1169875#COMP336#COMP2421#COMP231#COMP338#CCOMP3351
Kevin Hart#1146542#COMP142#COMP242#COMP231#COMP336#COMP331#COMP334
George Clooney#1164561#COMP336#COMP2421#COMP231#COMP338#CCOMP3351
Matt Damon#1118764#COMP439#COMP4232#COMP422#COMP311#COMP338
Johnny Depp#1019876#COMP311#COMP242#COMP233#COMP3431#COMP333#COMP432
CodePudding user response:
Generally, using strtok
is a good solution to the problem:
#include <stdio.h>
#include <string.h>
int main( void )
{
char line[] =
"Emma Stone#1169876#COMP242#COMP333#COMP336#COMP133#COMP231";
char *p;
int num_tokens = 0;
p = strtok( line, "#" );
while ( p != NULL )
{
num_tokens ;
printf( "Token #%d: %s\n", num_tokens, p );
p = strtok( NULL, "#" );
}
}
This program has the following output:
Token #1: Emma Stone
Token #2: 1169876
Token #3: COMP242
Token #4: COMP333
Token #5: COMP336
Token #6: COMP133
Token #7: COMP231
However, one disadvantage of using strtok
is that it is destructive in the sense that it modifies the string, by replacing the #
delimiters with terminating null characters. If you do not want this, then you can use strchr
instead:
#include <stdio.h>
#include <string.h>
int main( void )
{
const char *const line =
"Emma Stone#1169876#COMP242#COMP333#COMP336#COMP133#COMP231";
const char *p = line, *q;
int num_tokens = 1;
while ( ( q = strchr( p, '#' ) ) != NULL )
{
printf( "Token #%d: %.*s\n", num_tokens, q-p, p );
num_tokens ;
p = q 1;
}
printf( "Token #%d: %s\n", num_tokens, p );
}
This program has identical output to the first program:
Token #1: Emma Stone
Token #2: 1169876
Token #3: COMP242
Token #4: COMP333
Token #5: COMP336
Token #6: COMP133
Token #7: COMP231
Another disadvantage with strtok
is that it is not reentrant or thread-safe, whereas strchr
is. However, some platforms provide a function strtok_r
, which does not have these disadvantages. But that function does still has the disadvantage of being destructive.
CodePudding user response:
Yes, you should use strtok
to split these strings.
On
how can I count the tokens
You can simply add a counter inside while
and increment it by one in each iteration to get the total number of tokens.
#include <stdio.h>
#include <string.h>
int main(void) {
char string[] = "Hello world this is a simple string";
char *token = strtok(string, " ");
int count = 0;
while (token != NULL) {
count ;
token = strtok(NULL, " ");
}
printf("Total number of tokens = %d", count);
return 0;
}
CodePudding user response:
You can also write your own function to handle this quite trivial split:
char **split(char *str, char **argv, size_t *argc, const char delim)
{
*argc = 0;
if(*str && *str)
{
argv[0] = str;
*argc = 1;
while(*str)
{
if(*str == delim)
{
*str = 0;
str ;
if(*str)
{
argv[*argc] = str;
*argc = 1;
continue;
}
}
str ;
}
}
return argv;
}
int main(void)
{
char *argv[10];
size_t argc;
char str[] = "Emma Stone#1169876#COMP242#COMP333#COMP336#COMP133#COMP231";
split(str, argv, &argc, '#');
printf("Numner of substrings: %zu\n", argc);
for(size_t i = 0; i < argc; i )
printf("token [%2zu] = `%s`\n", i, argv[i]);
}
https://godbolt.org/z/b1aarnfWs
Remarks: same as strtok it requires str
to me modifiable. str
will be modified.