Perform opportunistic disk reads-CodePudding

In Linux, how do I implement a function (or direct system call) that will:

block until at least some minimum number of bytes (32 or 4,096) are read from an ordinary file (on local disk, not a “file” connected to a network device) and
provide, when it does return, as many bytes as are currently available (immediately, with no further blocking), up to a passed buffer size (32 MiB)?

CodePudding user response：

This is what read does:

If there's no data available, it waits until data is available.
If/when there's data available, it returns all available data up to the specified amount.

For example, say you want to read at least 4 KiB, and you ask for 64 KiB. If there's 8 KiB available, read will immediately return that 8 KiB. (It won't wait for another 56 KiB to arrive.)

Similarly, if only 2 KiB is initially available, read will return that. There's no way to tell the system to wait until 4 KiB is available. So that means we need to call read if we want to obtain a minimum amount.

ssize_t read_min( int fd, void *buf, size_t max_to_read, size_t min_to_read ) {
   if ( min_to_read > SSIZE_MAX )
      min_to_read = SSIZE_MAX;
   if ( max_to_read > SSIZE_MAX )
      max_to_read = SSIZE_MAX;
      
   size_t total_read = 0;
   while ( total_read < min_to_read ) {
      ssize_t bytes_read = read( fd, buf   total_read, max_to_read );
      if ( bytes_read < 0 )
         return -1;
      if ( bytes_read == 0 )
         return total_read;

      max_to_read -= bytes_read;
      total_read   = bytes_read;
   }

   return total_read;
}

CodePudding user response：

Alternatively, you can use a blocking call select() on file-descriptor.

We assume file is constantly written in to, like a log-file or a CDR-file.
When you don't specify timeout in select() call, it blocks until file descriptor is available for read/write, just for read in our case.
Using FD_SET we set our fd, make select() wait until it's ready for reading.
When select returns, verify if fd indeed ready for reading using FD_ISSET macro.
Repeat the steps until you've read enough data to meet your requirement.

This is a common method used in sockets to read expected payload/data after a header/type-value.

// assumes buf_sz > read_min
ssize_t read_min (const int fd, void *buf, size_t buf_sz, size_t read_min) {
    if (buf_sz > SSIZE_MAX) buf_sz = SSIZE_MAX;
    if (read_min > buf_sz) read_min = buf_sz;
    ssize_t total = 0;
    ssize_t bytes_read = 0;

    while (total < read_min) {
        fd_set rfds;
        FD_ZERO (&rfds);
        FD_SET (fd, &rfds);

        int status = select (fd   1, &rfds, NULL, NULL, NULL); // blocking call
        if (-1  == status)
            perror ("read_min-select()");
        if (FD_ISSET (fd, &rfds)) {

            bytes_read = read (fd, buf   total, buf_sz - total);
            if (bytes_read > 0)
                total   = bytes_read;
            if (bytes_read < 0)
                perror ("read_min-read()");
         /* if (0 == bytes_read) // EOF, do you want to return here?
               return total;
          */
        }
    }
    return total;
}

CodePudding user response：

There is no standard function that does what I think you are asking:

perform a read that blocks until it transfers at least n bytes, where n > 1, yet
read up to many times n bytes if it can do so without blocking more than it needs to do to get n bytes.

You can build your own on top of read(), however. read() is a good fit because (with blocking I/O, the default) it does what you want for the case of n == 1. Therefore, if you simply perform read()s into the same buffer until you have transferred at least n bytes in total, or the end of the file is reached or an error occurs, then the overall result is what you describe. Something like this:

ssize_t read_min(int fd, void *buf, size_t buf_size, size_t min_to_read) {
    ssize_t total_read = 0;
    ssize_t nread;

    do {
        n_read = read(fd, buffer   total_read, buf_size - total_read);
    } while (nread > 0 && total_read < min_to_read);

    return (nread < 0) ? -1 : total_read;
}