Home > database >  How much of modern FFmpeg is written by Fabrice Bellard?
How much of modern FFmpeg is written by Fabrice Bellard?

Time:12-17

FFmpeg is considered by many to be the work of Fabrice Bellard, and maybe even his magnum opus, but since he stopped contributing to the project (under the pseudonym Gérard Lantau) in 2004, I wondered how much of it can actually still be said to be his. By comparison, Linus Torvalds' Wikipedia page states:

As of 2006, approximately 2% of the Linux kernel was written by Torvalds himself.[28] Because thousands have contributed to it, his percentage is still one of the largest. However, he said in 2012 that his own personal contribution is now mostly merging code written by others, with little programming.

This despite the fact that Torvalds is still an active contributor to the Linux kernel, whereas Bellard hasn't been an active contributor to FFmpeg for almost two decades.

FFmpeg being an open-source project tracked with Git, it seems like the question should be technically and objectively answerable, but as someone who hates mailing lists and the generally archaic ways that big open-source projects like to do things, I wouldn't know where to start in doing so.

So just how much of the modern FFmpeg codebase is Fabrice Bellard actually responsible for, compared to the other FFmpeg devs still actively working on the project?

CodePudding user response:

Naive answer: calculating percentage of commits

This was simpler to do than I expected, turns out it could all be done in Git.

First I cloned FFmpeg from its Git server and waited a few minutes for Git to download the several hundred megabytes that make up the FFmpeg codebase:

git clone https://git.ffmpeg.org/ffmpeg.git

Since git shortlog -sne --all prints a full list of contributors by number of commits, I did:

$ git shortlog -sne --all | grep fabrice
613  Fabrice Bellard <[email protected]>

Interestingly, git shortlog -sne --all | grep lantau doesn't return anything, despite "Gerard Lantau" widely being cited as the pseudonym that he wrote FFmpeg under.

I then got a list of all 613 of Bellard's commits with:

git log --author="Fabrice Bellard"

This shows that the last of these commits was made in 2015.

Doing:

git log --author="Fabrice Bellard" --reverse

...shows that the first one was made in December 2000, via Subversion:

commit 9aeeeb63f7e1ab7b0b7bb839a5f258667a2d2d78   
Author: Fabrice Bellard <[email protected]>
Date:   Wed Dec 20 00:02:47 2000  0000

Initial revision

Originally committed as revision 2 to svn://svn.ffmpeg.org/ffmpeg/trunk

As a naive answer to the question, I can calculate the number of commits Fabrice Bellard made as a percentage of all the commits ever made to FFmpeg. git log --all | wc -l shows a total of 1412173 (1.4 million) commits to FFmpeg from 2,368 developers (git shortlog -sne --all | wc -l).

613 as a percentage of 1,412,173 is 0.04340827929, which means Fabrice Bellard's commits currently represent 0.04% of the FFmpeg codebase, with the other ~2000 devs being responsible for the remaining 99.96%.

This is interesting, but commits as a metric don't seem like they would paint a realistic picture - to me a more interesting but more complex metric would be how many lines of code that Fabrice Bellard wrote still exist in the FFmpeg codebase. I don't know if this is possible with Git, and if it is, I definitely don't know how to do it accurately.

CodePudding user response:

With some 8000 files in the repo containing a total of nearly 2 million lines, running git blame on each file will take a long time, but it would let you see how many lines were still in the repo that Bellard/Lantau had contributed. As @Gyan says, though, this will only report lines that are exactly as he wrote them, any change in whitespace or style will be attributed to the person who made those trivial changes.

That being said, here's the loop:

git clone https://github.com/FFmpeg/FFmpeg
cd FFmpeg
for f in $(git ls-tree HEAD -r --name-only) ; do git blame $f ; done > blame

That loop will take a long time to run, but eventually you'll be able to extract the author from each line with something like this to get the number of lines last touched by each author:

cat blame | sed -e 's/ *20[012][0-9].*//' -e 's/^[^(]*(//' | sort | uniq -c | sort -n

Or just run wc blame and grep "Fabrice Bellard" blame | wc to see only Bellard's counts.

I don't see the name Lantau anywhere, so looking for Bellard should be the number you're looking for.

As of posting this answer, my blame loop has looked at 200 K lines, and attributes 1980 of them to Bellard, or almost 1%. I'll edit this answer with final numbers once they're computed.

  • Related