Home > Back-end >  Why does BSD use a double pointer in a linked list entry?
Why does BSD use a double pointer in a linked list entry?

Time:09-28

This is the BSD implementation of an entry in a double linked list?

#define LIST_ENTRY(type)                        \
struct {                                \
    struct type *le_next;   /* next element */          \
    struct type **le_prev;  /* address of previous next element */  \
}

Why do they use a double pointer? Why not just have

#define LIST_ENTRY(type)                        \
struct {                                \
    struct type *le_next, *le_prev;     \
}

I dont see any problem with that, and it looks simpler in my opinion, so why not use it? Am I missing something?

CodePudding user response:

Why do they use a double pointer?

The source file contains reasonably good explanatory comments. The ones that apply to this particular data structure are:

 * A list is headed by a single forward pointer (or an array of forward
 * pointers for a hash table header). The elements are doubly linked
 * so that an arbitrary element can be removed without a need to
 * traverse the list. New elements can be added to the list before
 * or after an existing element or at the head of the list. A list
 * may only be traversed in the forward direction.

There are several key things to note about that, among them:

  1. Even though the nodes have pointers by which the previous node can be accessed, this kind of list is intended for forward traversal only.

  2. The list is headed by a single pointer. It is implied, and this can be verified elsewhere in the source, that that single pointer is not a member of a list node object.

  3. The reason for the double-linkage is states as "so that an arbitrary element can be removed without a need to traverse the list". Reading just a little between the lines, we can take that to mean that if I have a pointer to a node then I can remove that node from the list to which it belongs, outside the context of a list traversal, and even if it is the first node in the list.

    Note well in this regard that the first node in such a list does not have a predecessor node that a struct type * could point to (see (2) above), but for a node to be in such a list at all, there must be a struct type * pointing to it, whether in another node or in the list object. Given a struct type ** pointing to that pointer, we can effect removals by modifying that pointer directly, instead of through the structure containing it.

  4. New elements can be added before or after any node. For much the same reason that the double pointer facilitates deleting the first node, it also facilitates inserting a new node before the first -- again, without traversing the list or knowing that the node is in fact first, or even which list it belongs to.

Why not just have

#define LIST_ENTRY(type)                        \
struct {                                \
    struct type *le_next, *le_prev;     \
}

I dont see any problem with that, and it looks simpler in my opinion, so why not use it? Am I missing something?

That would be the conventional form for a node in a doubly-linked list. It could provide the same removal and insertion semantics described in (3) and (4) above, provided that the list was structured with a dummy head node. It's a bit obscure to me how this structure is actually used, however, so perhaps there's a reason why that would be unworkable. Or perhaps this choice is simply for a little bit of space efficiency.

In any case, that's not how the list is structured (see (2)). Given the chosen structure for the list head, the double pointer structure for previous nodes is required to serve the stated purposes of this list and its nodes. Because the list is intended to be traversed in the forward direction only, the double-pointer structure is not an impediment.

CodePudding user response:

@lundin you are mistaken. queue(3) was invented in the early 1990s (it first made it to 4.4BSD; see e.g. the NetBSD queue(3) manpage), a time when function inlining and cache memories were very much mainstream. The "not a doubly-linked-list" point is meaningless. Is a doubly-linked-list in Smalltalk-80, whose references are indexes into an Object Table which itself points to extra instance variables of the object, not a doubly-linked list because the linkage is not in the form of direct pointers? Which authority has decreed that

Inlining indeed is one of Allen's "Eight Optimising Transformations" - which she detailed over 50 years ago! Are you perhaps hung up on the use of macros? This is the C programming language; it has a macro preprocessor, and macros are almost unavoidable if you would build type-safe generic data structures (as queue(3) offers.) If what you believe is that macros are used to force inlining, then that is an incorrect assumption. It is to combine type safety with genericity that they are used.

Your point on cache memories is also unclear. What are you trying to say? I assume you want to argue that the indirection here is an inefficiency. But as the le_prev pointer points to the le_next member of the previous element, i.e. within the element which you would be pointing to anyway (in the alternative case where you have a pointer to the previous element, if there is one), the entire object itself (if it fits within a cache line) may be fetched. In the alternative case, where le_prev is a pointer to the previous element itself, then you may well have the linkage beyond the first cache line. I do not believe there is a clear benefit here.

More importantly, this generic treatment of what can be a pointer to either a previous element's next-element pointer, or the list head's first-element pointer, eliminates a conditional branch (is there a previous element? if not, then modify the list head's first-element pointer instead). Eliminating this branch is a clear benefit. It also makes for a more ergonomic interface (no need to pass the list head when doing insertion-before.)

Finally - "archaeology and nostalgia" is a strange way to talk about something which is used to this day in numerous places. All the modern BSDs use it (XNU uses a mixture of queue(3), Mach's list.h, and soemthing else in the C code). OpenSSH and other software of BSD extraction uses it, as do many other programs. I think that without recourse to superficial or aesthetic concerns, this argument simply cannot be made.

  • Related