Home > other >  Can advice of posix_fadvise() be combined?
Can advice of posix_fadvise() be combined?

Time:02-16

I am trying to hash a whole bunch of files and want to saturate I/O on my system as well as reasonably possible. This use case makes three things simultaneously true

  • I will read the file only once
  • I will need the whole file
  • The file will be read sequentially

Can I combine fadvise() suggestions, or if I make multiple suggestions on the same range does one override the other?

I am trying three sequential calls since it seems like policies can't be OR'd like flags.

os.posix_fadvise(f, 0, 0, os.POSIX_FADV_SEQUENTIAL)
os.posix_fadvise(f, 0, 0, os.POSIX_FADV_WILLNEED)
os.posix_fadvise(f, 0, 0, os.POSIX_FADV_NOREUSE)

But before I was just advising WILLNEED. From the man page it seems to just set a read ahead buffer policy and WILLNEED seems most sensible, but it's also true that I need to grab data sequentially off of an HDD, and I don't intend to read it a second time.

Is the behaviour for this even defined or is it just up to the implementer for the target platform?

CodePudding user response:

Implementation

According to this implementation of fadvise I found, there is a switch applied to the advice flag. You can see that attributes like the read-ahead page count file->f_ra.ra_pages does get "switched" depending on the selected flag. But other caching related function calls aren't (force_page_cache_readahead).

switch (advice) {
    case POSIX_FADV_NORMAL:
        file->f_ra.ra_pages = bdi->ra_pages;
        spin_lock(&file->f_lock);
        file->f_mode &= ~FMODE_RANDOM;
        spin_unlock(&file->f_lock);
        break;
    case POSIX_FADV_RANDOM:
        spin_lock(&file->f_lock);
        file->f_mode |= FMODE_RANDOM;
        spin_unlock(&file->f_lock);
        break;
    case POSIX_FADV_SEQUENTIAL:
        file->f_ra.ra_pages = bdi->ra_pages * 2;
        spin_lock(&file->f_lock);
        file->f_mode &= ~FMODE_RANDOM;
        spin_unlock(&file->f_lock);
        break;
    case POSIX_FADV_WILLNEED:
        /* First and last PARTIAL page! */
        start_index = offset >> PAGE_SHIFT;
        end_index = endbyte >> PAGE_SHIFT;
        /* Careful about overflow on the " 1" */
        nrpages = end_index - start_index   1;
        if (!nrpages)
            nrpages = ~0UL;
        /*
         * Ignore return value because fadvise() shall return
         * success even if filesystem can't retrieve a hint,
         */
        force_page_cache_readahead(mapping, file, start_index, nrpages);
        break;
    case POSIX_FADV_NOREUSE:
        break;
    case POSIX_FADV_DONTNEED:
        if (!inode_write_congested(mapping->host))
            __filemap_fdatawrite_range(mapping, offset, endbyte,
                           WB_SYNC_NONE);
        /*
         * First and last FULL page! Partial pages are deliberately
         * preserved on the expectation that it is better to preserve
         * needed memory than to discard unneeded memory.
         */
        start_index = (offset (PAGE_SIZE-1)) >> PAGE_SHIFT;
        end_index = (endbyte >> PAGE_SHIFT);
        /*
         * The page at end_index will be inclusively discarded according
         * by invalidate_mapping_pages(), so subtracting 1 from
         * end_index means we will skip the last page.  But if endbyte
         * is page aligned or is at the end of file, we should not skip
         * that page - discarding the last page is safe enough.
         */
        if ((endbyte & ~PAGE_MASK) != ~PAGE_MASK &&
                endbyte != inode->i_size - 1) {
            /* First page is tricky as 0 - 1 = -1, but pgoff_t
             * is unsigned, so the end_index >= start_index
             * check below would be true and we'll discard the whole
             * file cache which is not what was asked.
             */
            if (end_index == 0)
                break;
            end_index--;
        }
        if (end_index >= start_index) {
            unsigned long count;
            /*
             * It's common to FADV_DONTNEED right after
             * the read or write that instantiates the
             * pages, in which case there will be some
             * sitting on the local LRU cache. Try to
             * avoid the expensive remote drain and the
             * second cache tree walk below by flushing
             * them out right away.
             */
            lru_add_drain();
            count = invalidate_mapping_pages(mapping,
                        start_index, end_index);
            /*
             * If fewer pages were invalidated than expected then
             * it is possible that some of the pages were on
             * a per-cpu pagevec for a remote CPU. Drain all
             * pagevecs and try again.
             */
            if (count < (end_index - start_index   1)) {
                lru_add_drain_all();
                invalidate_mapping_pages(mapping, start_index,
                        end_index);
            }
        }
        break;
    default:
        return -EINVAL;
    }

Conclusion

Depending on the system, the implementation might vary slightly (if you're not using Linux) as it seems POSIX fadvise isn't absolutely clear about the rules around different flag combinations. But it seems possible that some properties are combined, while others aren't. Hopefully someone more experienced can elucidate.

  • Related