Home > Back-end >  What happens in C function readdir() from dirent.h?
What happens in C function readdir() from dirent.h?

Time:12-23

I'm doing a school assignment and the task at hand is to count the files and folders recursively. I use the readdir() function, it seems to iterate through the directory I gave it.

int listdir(const char *path) 
{
  struct dirent *entry;
  DIR *dp;

  dp = opendir(path);
  if (dp == NULL) 
  {
    perror("opendir");
    return -1;
  }

  while((entry = readdir(dp)))
    puts(entry->d_name);

  closedir(dp);
  return 0;
}

I want to see the "something ;" step of this function, there should be one, right? All I can find is this line in glibc's dirent/dirent.h

extern struct dirent *readdir (DIR *__dirp) __nonnull ((1)); 

and


struct dirent *
__readdir (DIR *dirp)
{
  __set_errno (ENOSYS);
  return NULL;
}
weak_alias (__readdir, readdir)

in dirent/readdir.c

Where does the iteration happen?

Maybe a duplicate of How readdir function is working inside while loop in C?

I tried to grep through glibc source code for readdir - didn't find, searched the Internet - didn't find, although some say there is an obsolete linux system call also called readdir.

There is also this

" The readdir() function returns a pointer to a dirent structure representing the next directory entry in the directory stream pointed to by dirp. It returns NULL on reaching the end of the directory stream or if an error occurred."

and this

" The order in which filenames are read by successive calls to readdir() depends on the filesystem implementation; it is unlikely that the names will be sorted in any fashion."

in man readdir .

From this answer - https://stackoverflow.com/a/9344137/12847376 I assume OS can hijack functions with LD_PRELOAD, I see no such variable in my default shell. And too many hits in the Debian source search.

I also grepped through the Linux kernel for LD_PRELOAD and readdir and got too many results on the syscall.

CodePudding user response:

I'm not sure exactly what you are trying to accomplish. I have implemented something similar to this for another language's core library, so I can say there is not a something. The reason for that, is that the structures returned by the operating system do not have a consistent size. The structure is something like the following:

struct dirent {
    long           d_ino;
    off_t          d_off;
    unsigned short d_reclen;
    char           d_type;
    char           d_name[];
};

You pass a buffer to the system call (I used getdents64), and it fills it in with a bunch of these dirent structures. That d_name[] does not have an officially known size. The size of the entire structure is defined by that d_reclen member of the struct.

In memory, you could have many struct dirent like this:

[0]                    [1]                                           [2]
44,0,24,DT_REG,"a.txt",41,0,47,DT_DIR,"a_really_long_directory_name",...

Here is a rough translation of how it works:

uint8_t buf[BUFLEN];
long n = getdents64(dfd, buf, BUFLEN);
if (n < 0) {
    // error
}

// buf now holds dirent structs

struct dirent* d = buf;
int i = 0;
for (; i < res; i  = d->d_reclen) { // <<<< this is the trick
     d = &buf[i];
     // do something with the d
}

Notice the way we increment i. Since the d_name member does not have an official size, we cannot just say struct dirent d[COUNT];. We don't know how big each struct will be.

CodePudding user response:

Where does the iteration happen?

On Linux, it happens here. As you can see, the code repeatedly calls getdents (system call) to obtain a set of entries from the kernel, and "advances" the dp by updating dirp->offset, etc.

  24 /* Read a directory entry from DIRP.  */
  25 struct dirent *
  26 __readdir_unlocked (DIR *dirp)
  27 {
  28   struct dirent *dp;
  29   int saved_errno = errno;
  30 
  31   if (dirp->offset >= dirp->size)
  32     {
  33       /* We've emptied out our buffer.  Refill it.  */
  34 
  35       size_t maxread = dirp->allocation;
  36       ssize_t bytes;
  37 
  38       bytes = __getdents (dirp->fd, dirp->data, maxread);
  39       if (bytes <= 0)
  40         {
  41           /* Linux may fail with ENOENT on some file systems if the
  42              directory inode is marked as dead (deleted).  POSIX
  43              treats this as a regular end-of-directory condition, so
  44              do not set errno in that case, to indicate success.  */
  45           if (bytes == 0 || errno == ENOENT)
  46             __set_errno (saved_errno);
  47           return NULL;
  48         }
  49       dirp->size = (size_t) bytes;
  50 
  51       /* Reset the offset into the buffer.  */
  52       dirp->offset = 0;
  53     }
  54 
  55   dp = (struct dirent *) &dirp->data[dirp->offset];
  56   dirp->offset  = dp->d_reclen;
  57   dirp->filepos = dp->d_off;
  58 
  59   return dp;
  60 }
  • Related