Aren't header-only C "libraries" wasteful?-CodePudding

I'm looking at a header-only C "library": https://github.com/zserge/jsmn/blob/master/jsmn.h

As far as I can understand, this code will be compiled into every object file where the .c file includes jsmn.h, wasting space.

(The file's function definitions are inside #ifndef JSMN_HEADER, so you could use this as a "traditional" header file by defining JSMN_HEADER.)

Why hasn't it been written as a "traditional" .c and .h pair?
Is the linker clever enough to dedup function identical definitions between object files? I would have expected "duplicate symbol" errors.
What advantage does putting code in headers give in C? (Not C .)
From where do you get the function definitions if you use #define JSMN_HEADER before importing?
Is jsmn.h being header-only a clever trick, from which I can learn?

CodePudding user response：

The header expands into one of the following, depending on the macros defined:

No macros — function definitions (public functions are non-static, private ones are).
JSMN_HEADER — function declarations (for public functions only).
JSMN_STATIC — static function definitions (for both private and public functions).

If only a single TU needs the library, you can freely choose between (1) and (3).

If more than one TU needs the library, you need (1) in one of the TUs and (2) in all others. Then the only thing that's "wasted" is preprocessor time for skipping function definitions for (2), but it shouldn't matter since they're so tiny.

In this case, trying to go with (1) for all TUs would give you "duplicate symbol" linker errors. Trying to go with (3) for all TUs would silently duplicate the symbols, which would be wasteful.

Why hasn't it been written as a "traditional" .c and .h pair?

For supposed convenience of distributing and using the library.

Is the linker clever enough to dedup function identical definitions between object files? I would have expected "duplicate symbol" errors.

It's probably not clever enough, but it doesn't need to be.

You're expected to define the macros so that only a single TU has the definitions.

What advantage does putting code in headers give in C?

Less source files to distribute.

From where do you get the function definitions if you use #define JSMN_HEADER before importing?

From your other TU that didn't define this macro before including the header.

Is jsmn.h being header-only a clever trick, from which I can learn?

Yes.

CodePudding user response：

If it is beneficial or not, it depends on the actual application and target platform. Embedded platform compilers were and often still are "special snowflakes" with behaviour outside of industry standards for sake of effectiveness.

"Embedded" is an umbrella term referring to a kind of application rather than type of hardware platform. Your smartphone running Android, an old 1999 cellphone, a Keysight network analyzer running Linux with Wayland support and OpenGL capabilities, bank ATM running MS Windows and kiosk application written in Qt 5.15, an ADSL router with MIPS CPU and aircraft's navigation console with multiple ARM single board computers in it, a SCADA data acquisition unit on some production line, a Raspberry-controlled drone, all those are embedded applications. They have different requirements, different mode of work, different environments.

There is a term that gets mixed-up with embedded - real-time. Real-time is actually a requirement to application and hardware to react in predetermined way. Router, drone and aircraft console might be considered such, while an ATM doesn't need to be realtime and doesn't require to be miniature (in opposite, it's preferable that no one would be able to drag it away).

How a fully static set of functions in a header could be better?

If we speaking of non-compliant compilers or those ones designed prior to ISO C99 standard they could inline them. If we speaking of ISO compiler, that never would happen unless they are declared inline(keyword is supported in C99), and static inline is just static.

Linker would drop unused names from each translation unit, so only used functions may get duplicates.

There are platforms that have different call conventions and memory model, leading to existence of "short" and "long" pointers and calls. A call between translation units had to be a long one to meet requirement of being able to be called from each unit. Within single unit it's likely to be "short" (and faster?) and can be determined automatically, because it is used only locally.

In some real-time platforms there are no real OS or kernel or thread/process manager, just a loader and bootstrap code linked to user's program as a library. Some of those able to run code in threads asynchronously, in which case the absence of deduplication is beneficial. It prevents execution flow to enter same function from different asynchronous calls. It's an isolation measure, reducing conflicts on cache level, as well as allowing to debug loaded unit externally if one want to monitor call to certain function when it is called from certain thread.

Finally, as it happens a lot with this industry, it might be a case of ritual programming. I would differ it from cargo cult code, as it could be done out of habit rather than out of lack of knowledge. While target platform didn't had any factors listed above, ths code was written in "familiar" style.