Home > Back-end >  Reading comma-separated words from a file
Reading comma-separated words from a file

Time:04-29

FILE* inp;
inp = fopen("wordlist.txt","r");        //filename of your data file
char arr[100][5];           //max word length 5
int i = 0;
while(1){
    char r = (char)fgetc(inp);
    int k = 0;
    while(r!=',' && !feof(inp)){    //read till , or EOF
        arr[i][k  ] = r;            //store in array
        r = (char)fgetc(inp);
    }
    arr[i][k]=0;        //make last character of string null
    if(feof(inp)){      //check again for EOF
        break;
    }
    i  ;
}

I am reading the file words and storing them in the array. My question is: how can I randomly select 7 of these words and store them in the array?

The input file has the following content:

https://ibb.co/LkSJ1SV

meal
cheek
lady
debt
lab
math
basis
beer
bird
thing
mall
exam
user
news
poet
scene
truth
tea
way
tooth
cell
oven

CodePudding user response:

First of all, your program has the following issues:

  1. In your posted input, some of the words are 5 characters long, but your array only has room for 4 characters plus the terminating null character.
  2. The words in your posted input are separated by newline characters, not commas. Therefore, it does not make sense that you are searching the input stream for ',' instead.

After fixing these two issues in your code and adding a function main and all necessary headers, it should look like this:

#include <stdio.h>

int main( void )
{
    FILE* inp;
    inp = fopen("wordlist.txt","r");        //filename of your data file
    char arr[100][6];           //max word length 5
    int i = 0;
    while(1) {
        char r = (char)fgetc(inp);
        int k = 0;
        while(r!='\n' && !feof(inp)) {   //read till , or EOF
            arr[i][k  ] = r;            //store in array
            r = (char)fgetc(inp);
        }
        arr[i][k]=0;        //make last character of string null
        if(feof(inp)){      //check again for EOF
            break;
        }
        i  ;
    }
}

In C, it is common to use the function rand to generate a random number between 0 and RAND_MAX. The macro constant RAND_MAX is guaranteed to be at least 32767.

In order to get a random number between 0 and i (not including i itself), you can use the following expression, which uses the modulu operator:

rand() % i

This will not give you an even distribution of random numbers, but it is sufficient for most common purposes.

Therefore, in order to select and print a random word, you can use the following statement:

printf( "%s\n", rand() % i );

If you want to select and print 7 random words, then you can run this statement in a loop 7 times. However, it is possible that the same word will be selected randomly several times. If you want to prevent this from happening, then you will have to use a more complex algorithm, such as a Fisher-Yates shuffle.

However, this will print the same random sequence of words every time you run your program. If you want the random number generator to generate a different sequence of random numbers every time the program is run, then you must seed the random number generator, by calling the function srand with some random data.

The simplest source of randomness is the current time. The function time will return an integer representing the current time, usually in seconds.

srand( (unsigned)time(NULL) );

However, since the function time usually uses seconds, this means that if you run the program twice in the same second, the random number generator will be seeded with the same value, so it will generate the same sequence of random numbers. If this is an issue, then you may want to find some other source of randomness.

After doing everything described above and adding the necessary headers, your program should look like this:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main( void )
{
    FILE* inp;
    inp = fopen("wordlist.txt","r");        //filename of your data file
    char arr[100][6];           //max word length 5

    srand( (unsigned)time(NULL) );

    int i = 0;
    while(1) {
        char r = (char)fgetc(inp);
        int k = 0;
        while(r!='\n' && !feof(inp)) {   //read till , or EOF
            arr[i][k  ] = r;            //store in array
            r = (char)fgetc(inp);
        }
        arr[i][k]=0;        //make last character of string null
        if(feof(inp)){      //check again for EOF
            break;
        }
        i  ;
    }

    //print 7 random words
    for ( int j = 0; j < 7; j   )
        printf( "%s\n", arr[rand()%i] );
}

For the input

meal
cheek
lady
debt
lab
math
basis
beer
bird
thing
mall
exam
user
news
poet
scene
truth
tea
way
tooth
cell
oven

this program gave me the following (random) output:

user
mall
poet
lab
cheek
lab
beer

As you can see, one of the random words is a duplicate.

As previously stated, you can shuffle the array using a Fisher-Yates shuffle if you want to prevent the same word from being chosen twice. After shuffling the array, you can simply select and print the first 7 elements of the array, if that is the number of words that you want to chose:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main( void )
{
    FILE* inp;
    inp = fopen("wordlist.txt","r");        //filename of your data file
    char arr[100][6];           //max word length 5

    srand( (unsigned)time(NULL) );

    int i = 0;
    while(1) {
        char r = (char)fgetc(inp);
        int k = 0;
        while(r!='\n' && !feof(inp)) {   //read till , or EOF
            arr[i][k  ] = r;            //store in array
            r = (char)fgetc(inp);
        }
        arr[i][k]=0;        //make last character of string null
        if(feof(inp)){      //check again for EOF
            break;
        }
        i  ;
    }

    //perform a Fisher-Yates shuffle on the array
    for ( int j = 0; j < i - 1; j   )
    {
        char temp[6];

        int k = rand() % ( i - j )   j;

        if ( j != k )
        {
            //swap both array elements
            strcpy( temp, arr[j] );
            strcpy( arr[j], arr[k] );
            strcpy( arr[k], temp );
        }
    }

    //print first 7 elements of the shuffled array
    for ( int j = 0; j < 7; j   )
    {
        //NOTE: This code assumes that i > 7, otherwise
        //it may crash.

        printf( "%s\n", arr[j] );
    }
}

Now, the same word can no longer be chosen twice:

meal
thing
news
user
mall
exam
tea

In my program above, I shuffled the entire array. However, if I only need the first 7 words to be randomized, then it would be sufficient to only perform 7 iterations of the outer loop when shuffling.

CodePudding user response:

Important

This solution has weak random distribution of word picking but considers any length of total words input from the file.

The Idea

  • Chars per word to keep things simple will be 256 at max (including \0) so we are safe for small examples.
  • The array to use will have the size of total words you want to save at the end. That's because there is no need to store all the words and then pick 7. You can pick 7 words while reading the file by overwritting previous words at random.
  • First while loop will make sure to fill the array so there are no empty cells.
  • Second while loop will be overwriting previous cells at random.

Solution

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define TOTAL_WORDS 7
#define CHARS_PER_WORD 256

void readWord(FILE* inp, char [TOTAL_WORDS][CHARS_PER_WORD], int i);

int main(int argc, char const *argv[]) {
    srand(time(NULL));

    char arr[TOTAL_WORDS][CHARS_PER_WORD] = { 0 };

    FILE* inp;
    inp = fopen("wordlist.txt","r");
    // make sure file opening did not fail
    if( inp == NULL ) {
        printf("Could not open file.\n");
        return 0;
    }

    int i = 0;

    while( i < TOTAL_WORDS && !feof(inp) )
        readWord(inp,arr,i  );

    while( !feof(inp) ) {
        if( (rand()%  2) == 1 )
            readWord(inp,arr,rand() % TOTAL_WORDS);
        else // consume the word without saving it
            while( fgetc(inp)!='\n' && !feof(inp) ) { } 
    }

    for( int i = 0; i<TOTAL_WORDS; i   ) 
        printf("%d: %s\n", i, arr[i]);

    return 0;
}

void readWord(FILE* inp, char arr[TOTAL_WORDS][CHARS_PER_WORD], int i) {
    int k = 0;
    char r = (char) fgetc(inp);
    while( r!='\n' && !feof(inp) ){
        arr[i][k  ] = r;
        r = (char) fgetc(inp);
    }
    arr[i][k]='\0';  
}

With input file wordlist.txt containing:

meal
cheek
lady
debt
lab
math
basis
beer
bird
thing
mall
exam
user
news
poet
scene
truth
tea
way
tooth
cell
oven

One of the results was:

0: scene
1: truth
2: tooth
3: oven
4: way
5: user
6: cell

Additions/Changes Explained

C libraries that contain funtions srand(), time() and rand() which will be used to randomize stuff.

#include <stdlib.h>
#include <time.h>

Definining how many words we want to keep at the end. This will be usefull if we want to change from 7 to something else. Instead of using 7 all over the palce we will use TOTAL_WORDS each time we want to refer to how many words we want to keep. In the same note defining how many chars per word.

#define TOTAL_WORDS 7
#define CHARS_PER_WORD 256

Initialize seed for funtion rand(). You can read more about it here.

srand(time(NULL));

Get a number in range of our array size. You can read more about rand() function here.

rand() % TOTAL_WORDS

To avoid repeating the same thing in both while loops, the part where you read a word from the file got encapsulated in a funtion. That makes main code a lot simpler to read and maintain.

void readWord(FILE* inp, char arr[TOTAL_WORDS][CHARS_PER_WORD], int i) {
    ...  
}

Print words saved.

for( int i = 0; i<TOTAL_WORDS; i   ) 
    printf("%d: %s\n", i, arr[i]);

CodePudding user response:

This solution is a better version of my initial solution.

Solution

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define INPUT_FILE "wordlist.txt"
#define TOTAL_WORDS 7
#define CHARS_PER_WORD 256
#define DELIMETER '\n'

FILE* input;

char word[CHARS_PER_WORD];
char words[TOTAL_WORDS][CHARS_PER_WORD];

void openFile();    
void readWord();
void saveWord(int position);
void pickWords();
void printWords();
int hasWordAt(int position);
int isFull();

int main(int argc, char const *argv[]) {
    srand(time(NULL));

    openFile();

    pickWords();

    printWords();

    return 0;
}

void printWords() {
    for( int i = 0; i<TOTAL_WORDS; i   ) 
        printf("%d: %s\n", i, words[i]);
}

void pickWords() {
    int pos;
    while( !feof(input) && !isFull() ) {
        readWord();
        do {
            pos = rand() % TOTAL_WORDS;
        } while( hasWordAt(pos) );
        saveWord(pos);
    }
    while( !feof(input) ) {
        readWord();
        if( (rand() % 2) == 0 )
            continue;
        pos = rand() % TOTAL_WORDS;
        saveWord(pos);
    }
}

int hasWordAt(int position) {
    return words[position][0] != '\0';
}

int isFull() {
    for( int i = 0; i<TOTAL_WORDS; i   ) 
        if( words[i][0] == '\0' )
            return 0;
    return 1;
}

void saveWord(int position) {
    strcpy(words[position],word);
}

void readWord() {
    int i = 0;
    char ch = (char) fgetc(input);
    while( ch != DELIMETER && !feof(input) ){
        word[i  ] = ch;
        ch = (char) fgetc(input);
    }
    word[i]='\0';  
}

void openFile() {
    input = fopen(INPUT_FILE,"r");
    if( input == NULL ) {
        printf("Couldn't open file.");
        exit(0);
    }
}

Improvements

  • Distribution of first words
  • Code structure

Currently i don't have the time to edit full explanation, although most of it is explained in my initial answer but better explained to this answer.

  •  Tags:  
  • c
  • Related