How do I use getline to read from a file and then tokenize it using strtok?-CodePudding

I want my program to read lines from a file using getline(), and then tokenize the words using strtok(), and put them into a two-dimensional array.

I understand there are probably way better ways to do this, but I am limited by what I've learned so far and the assignment requirements.

I've tried using these threads/sites to refer to for help:

C strtok() tutorial

using getline to read from file in c

#include <iostream>
#include <fstream>
#include <cstring>
using namespace std;

int main(int argc, char **argv)
{
    char words[100][16]; //Holds the words
    int uniquewords[100]; //Holds the amount of times each word appears
    int counter; //counter

    if (argc = 2)
    {
        cout << "Success";
    }
    else
    {
        cout << "Please enter the names of the files again. \n";
        return 1;
    }

    ifstream inputfile;
    ofstream outputfile;

    inputfile.open(argv[1]);
    outputfile.open(argv[2]);

    char *token;
    while(inputfile.getline(words, 100))
    {
        token = strtok(words[100][16], " ");
        cout << token;
    }
}

The error message I'm getting is

error: no matching function to call to 'std::basic_ifstream::getline(char [100][16], int)'

CodePudding user response：

First of all, the line

if (argc = 2)

is probably not doing what you intend. You should probably write this instead:

if (argc == 2)

The function std::istream::getline requires as a first parameter a char *, which is the address of a memory buffer to write to. However, you are passing it the 2D array words, which does not make sense.

You could theoretically pass it words[0], which has space for 16 characters. Passing words[0] will decay to &words[0][0], which is of the required type char *. However, the size of 16 characters will probably not be sufficient. Also, it does not make sense to write the whole line into words, as this 2D array seems to be intended to store the result of strtok.

Therefore, I recommend that you introduce an additional array that is supposed to store the entire line:

char line[200];
(...)
while( inputfile.getline( line, sizeof line ) )

Also, the line

token = strtok(words[100][16], " ");

does not make sense, as you are accessing the array words out of bounds. Also, it does not make sense to pass a 2D array to std::strtok, either.

Another issue is that you should call std::strtok several times, once for every token. The first parameter of std::strtok should only be non-NULL on the first invocation. It should be NULL on all subsequent calls, unless you want to start tokenizing a different string.

After copying all tokens to words, you can then print them in a loop:

#include <iostream>
#include <fstream>
#include <cstring>
using namespace std;

int main(int argc, char **argv)
{
    char line[200];
    char words[100][16];
    int counter = 0;

    ifstream inputfile;

    inputfile.open(argv[1]);

    while( inputfile.getline( line, sizeof line) )
    {
        char *token;

        token = strtok( line, " ");

        while ( token != nullptr )
        {
            strcpy( words[counter  ], token );
            token = strtok( nullptr, " " );
        }
    }

    //print all found tokens
    for ( int i = 0; i < counter; i   )
    {
        cout << words[i] << '\n';
    }
}

For the input

This is the first line.
This is the second line.

the program has the following output:

This
is
the
first
line.
This
is
the
second
line.

As you can see, the strings were correctly tokenized.

However, note that you will be writing to the array words out of bounds if

any of the tokens has a size larger than 16 characters, or
the total number of tokens is higher than 100.

To prevent this from happening, you could add additional checks and abort the program if such a condition is detected. An alternative would be to use a std::vector of std::string instead of a fixed-size array of C-style strings. That solution would be more flexible and would not have the problems mentioned above.

CodePudding user response：

To read a string into a character array using C , use std::cin.get(). You can then tokenize the string using strtok() if you wish.

#include <cstring>
#include <iostream>

int main()
{
  char   s[1000];
  char * words[100] = {nullptr};  // each char* points to a piece of s[]
  int    size = 0;
  
  std::cout << "s? ";
  std::cin.get( s, sizeof(s) );
  
  const char * delimiters = " ";  // " .!:;,?" etc
  
  for (char * token = strtok( s, delimiters );  token;  token = strtok( NULL, delimiters ))
  {
    words[size  ] = token;  // this just adds each word to the list
  }
  
  for (int n = 0;  n < size;  n  )
    std::cout << (n 1) << ": " << words[n] << "\n";
}

If you wish to count how many words are unique, you need to use a slightly different method than just adding the word to the end of an array. Instead, look through the array to see if the word is already there. If it is, increment its corresponding count:

char * words [100] = {nullptr};
int    counts[100] = {0};
int    size = 0;

n = index of word in words[], or size if word is not in words[]
counts[n]  = 1;
words[n] = word;

You will need strcmp() to determine if the word is equal to a word found in words[]:

if (strcmp( word, words[n] ) == 0)

Choice of array: `char * words[100]` vs `char words[100][16]`

Notice how I am not using strcpy(). This is because I am simply using the address of each word found in s. If you modify s, you break your words. Nevertheless, if you do not plan to modify s then it is how I recommend you do it.

However, you can, if you wish, strcpy() the words found in s to a words[][] array as you have it in your current code:

char words[100][16];
int size = 0;

for (char * token = ...)
{
  strcpy( words[size  ], token );
}

This presumes, however, that every word you find will definitely be 15 characters or fewer. You can use strncpy() to make sure that each word does not overflow your words[n][] buffer.

IMHO all this string copying just makes life a little more difficult for you. Though it does mean you can reuse s without ruining any of the words you have already stored in words[].

Choice of array: char * words[100] vs char words[100][16]

Choice of array: `char * words[100]` vs `char words[100][16]`