Home > Software engineering >  Why should I put SEEK_SET twice
Why should I put SEEK_SET twice

Time:01-31

I want to modify some vowels of a file by "5". The following code works. However, I do not understand why I should put fseek twice.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

void print_file_contents(const char *filename) 
{
    FILE *fp;
    char letter;

    if((fp=fopen(filename,"r "))==NULL)
    {
        printf("error\n");
        exit(1);
    }

    fseek(fp,0,SEEK_END);
    int size=ftell(fp);
    rewind(fp);

    for(int i=0;i<size;i  )
    {
        fseek(fp,i,SEEK_SET);
        letter=fgetc(fp);
    
        if((letter=='a') || (letter=='e') || (letter=='i'))
        {
            fseek(fp,i,SEEK_SET); // WHY THIS FSEEK ?
            fwrite("5",1,sizeof(char),fp);
        }

    }

    fclose(fp);
}

int main(int argc, char *argv[])
{
    print_file_contents("myfile");
    return 0;
}

In my opinion, the first fseek(fp, i, SEEK_SET) is used to set the file position indicator to the current character being processed, so that the character can be read using fgetc. Hence, the cursor is updated every time so there is no need to add another fseek(fp, i, SEEK_SET);.

CodePudding user response:

The fgetc advanced the file position; if you want to replace the character you just read, you need to rewind back to the same position you were in when you read the character to replace.

CodePudding user response:

Note that the C standard mandates a seek-like operation when you switch between reading and writing (and between writing and reading).

§7.21.5.s The fopen function ¶7:

¶7 When a file is opened with update mode (' ' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end- of-file.

Also, calling fgetc() moves the file position forward one character; if the write worked (it's undefined behaviour if you omit the seek-like operation), you'd overwrite the next character, not the one you just read.

CodePudding user response:

Your intuition is correct: two of the three fseek calls in this program are unnecessary.

The necessary fseek is the one inside the if((letter=='a') || (letter=='e') || (letter=='i')) conditional. That one is needed to back up the file position so you overwrite the character you just read (i.e. the vowel), not the character after the vowel.

The fseek inside the loop (but outside the if) is unnecessary because both fgetc and fwrite advance the file position, so it will always set the file position to the position it already has. And the fseek before the loop is unnecessary because you do not need to know how big the file is to implement this algorithm.

This code can be tightened up considerably. I'd write it like this:

#include <stdio.h>

void replace_aie_with_5_in_place(const char *filename) 
{
    FILE *fp = fopen(filename, "r ");  // (1)
 
    if (!fp) {
        perror(filename);  // (2)
        exit(1);
    }

    int letter;
    while ((letter = fgetc(fp)) != EOF) {  // (3)
        if (letter == 'a' || letter == 'e' || letter == 'i') { // (4)
            fseek(fp, -1, SEEK_CUR); // (5)
            fputc('5', fp);
            if (fflush(fp)) { // (6)
                perror(filename);
                exit(1);
            }
    }

    if (fclose(fp)) {  // (7)
        perror(filename);
        exit(1);
    }
}

int main(int argc, char *argv[])
{
    if (argc != 2) {
        fprintf(stderr, "usage: %s filename\n", argv[0]);
        return 1;
    }
    replace_aei_with_5_in_place(argv[1]); // (8)
    return 0;
}

Notes:

  1. It is often (but not always) better to write operations with side effects, like fopen, separately from conditionals checking whether they succeeded.
  2. When a system-level operation fails, always print both the name of any file involved, and the decoded value of errno. perror(filename) is a convenient way to do this.
  3. You don't need to know the size of the file you're crunching because you can use a loop like this, instead. Also, this is an example of an exception to (1).
  4. Why not 'o' and 'u' also?
  5. Here's the necessary call to fseek, and the other reason you don't need to know the size of the file: you can use SEEK_CUR to back up by one character.
  6. This fflush is necessary because we're switching from writing to reading, as stated in Jonathan Leffler's answer. Inconveniently, it also consumes the notification for some (but not all) I/O errors, so you have to check whether it failed.
  7. Because you are writing to the file, you must also check for delayed I/O errors, reported only on fclose. (This is a design error in the operating system, but one that we are permanently stuck with.)
  8. Best practice is to pass the name of the file to munge on the command line, not to hardcode it into the program.

CodePudding user response:

@Jonathan Leffler well states why code used multiple fseek(): To cope with changing between reading and writing.


int size=ftell(fp); is weak as the range of returned values from ftell() is long.


Seeking in a text file (as OP has) also risks undefined behavior (UB).

For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET. C17dr § 7.21.9.1 3.

Better to use @zwol like approach with a small change.

Do not assume a smooth linear mapping. Instead, note the location and then return to it as needed.

int replacement = '5';
for (;;) {
    long position = ftell(fp);
    if (ftell == -1) {
        perror(filename);
        exit(1);
    }
    int letter = fgetc(fp);
    if (letter == EOF) {
        break;
    }
    if (letter == 'a' || letter == 'e' || letter == 'i') {
        fseek(fp, position, SEEK_SET);
        fputc(replacement, fp);
        if (fflush(fp)) {
            perror(filename);
            exit(1);
        }
    }
}

Research fgetpos(), fsetpos() for an even better solution that handles all file sizes, even ones longer than LONG_MAX.

  •  Tags:  
  • c
  • Related