Consider Chromium
codebase. It's huge, around 4gb of pure code, if I'm not mistaken. But however humongous it may be, it's still modular in its nature. And it implements a lot of interesting features in its internals.
What I mean is for example I'd like to extract websocket
implementation out of the sources, but it's not easy to do by hand. Ok, if we go to https://github.com/chromium/chromium/tree/main/net/websockets we'll see lots of header files. To compile the code as a "library" we're gonna need them their implementation in .cpp
files . But the trick is that these header files include
other header files in other directories of the chromium
project. And those in their turn include
others...
BUT if there are no circular dependencies we should be able to get to the root of this tree, where header files won't include
anything (or will include
already compiled libraries), which should mean that all the needed files for this dependency subtree are in place, so we can compile a chunk of the original codebase separate from the rest of it.
That's the idea. At least in theory.
Does anyone know how it could be done? I've found this repo and this repo, but they only show the dependency graph and do not have the functionality to extract a tree from it.
There should be a tool already, I suppose. It's just hard to word it out to google. Or perhaps I'm mistaken and this approach wouldn't really work?
CodePudding user response:
Your compiler is almost surely capable of extracting this dependency information so that it can be used to help the build system figure out incremental builds. In gcc
, for instance, we have the -MMD
flag.
Suppose we have four compilation units, ball.cpp
, football.cpp
, basketball.cpp
, and hockey.cpp
. Each source file includes a header file of the same name. Also, football.hpp
and basketball.hpp
each include ball.hpp
.
If we run
g -MMD -c -o football.o football.cpp
g -MMD -c -o basketball.o basketball.cpp
g -MMD -c -o hockey.o hockey.cpp
g -MMD -c -o ball.o ball.cpp
then this will produce, in addition to the object files, some files with names like basketball.d
that contain dependency information like
basketball.o: basketball.cpp basketball.h ball.h
It's simple enough to read these into, say, a python script, and then just take the union of all the dependencies of the files you want to include.
EDIT: In fact, python may even be overkill. In the situation above, if you wanted to get all dependencies for anything containing the word "ball," you could do something like
$ cat *.d | awk -F: '$1 ~ "ball" { print $2 }' | xargs -n 1 echo | sort | uniq
which will output
ball.cpp
ball.h
basketball.cpp
basketball.h
football.cpp
football.h
If you're not used to reading UNIX pipelines, this:
- Concatenates all the *.d files in the current directory;
- Goes through them line-by-line, splitting each line into fields delimited by
:
characters; - Prints out the second field (i.e. the list of dependencies) for any line where the first field (i.e. the target) matches the regex "ball";
- Splits the results into individual lines;
- Sorts the resulting lines; and
- Throws out any duplicates.
You can see that this produced a list of everything the ball-related files depend on, but skipped hockey.cpp
and hockey.hpp
which aren't dependencies of any file with "ball" in its name. (Of course in your case you might use "websockets" instead of "ball," and if there is some directory structure instead of everything being in the root directory you may have to do a bit to compensate for that.)