I'm working on a git diff parser. The main task is to find all changed function signatures. Sometimes in the chunk line with @@@ .... @@@ contains these information but sometimes not. Last time I changed in greet() cout message and it is visible on first image as changed line and it is correct, but above in @@@... line appears "void functOne() {" and that is not changed. The second picture is about a dummy cpp source code to test git diff.
The main questions are
How can I list all changed function's signatures?
Why sometimes appears unchanged function name ?
Why sometimes doesn't appears any function name/signature in line with @@@.... ?
CodePudding user response:
The git diff
command doesn't care about any functions. git repositories can contain any kind of text files (binary files too, but that's immaterial here), not just C source.
The diff command doesn't attempt to interpret the file in any way. Only a C compiler can fully understand a C file and process all function declarations.
The diff command only looks for discrete lines of text that changed and shows them together with a few unchanged lines that precede and follow them.
If the changed lines happen to be at the beginning of a function declaration, then this would include the function declaration. If they are in the middle of a long function, you only see the few preceding lines, that's it.
There are git diff options that control how many unchanged lines are shown (check git's documentation). Specifying a million lines, for example, results in the entire file getting shown, with all the changed lines marked up.
You can do that if you wish, then try to figure out the names of all the changed functions yourself, but until you write a complete C compiler, yourself, your heuristic parsing attempts won't be 100% correct. You might've noticed, tucked away in git diff
output an indication of what git guessed the changed function might be. But, since git is also not a C compiler, that's also wrong, occasionally.
CodePudding user response:
Sometimes in the chunk line with @@@ .... @@@
Git calls this a hunk header (after other diff software that also calls it that).
... contains [the function name] but sometimes not.
What Git puts in the function section of a diff hunk header is produced by matching earlier lines against a particular regular expression, as described in the gitattributes documentation under xfuncname
(search for that string). But note that this is a regular expression, and regular expressions are inherently less capable than parsers; there will always exist valid C constructs that can be parsed, but not recognized by some regular expression you can write.
If Git's built in C xfuncname
pattern is not adequate for your use, you can write your own pattern. But it's always going to be limited because regular expressions can only recognize regular languages (these are CS-theoretical or informatics terms, not to be interpreted as ordinary English language; for more, see, e.g., Regular vs Context Free Grammars).