I'm trying to compute the source code size of gcc
by considering cpp
files first:
# NOTE: the cpp loop finishes immediately
LOC=0
BYTES=0
FILES=$(find . -name "*.cpp")
for f in ${FILES};
do
BYTES_TAG=$(stat --printf="%d" $f)
LOC_TAG=$(cat $f | wc -l)
BYTES=$((BYTES BYTES_TAG))
LOC=$((LOC LOC_TAG))
done
echo "LOC = $LOC, SIZE = $BYTES"
Then I try to sum the *.c
files, but the bash loop doesn't stop. here is my gcc
version:
$ wget https://ftp.gnu.org/gnu/gcc/gcc-11.2.0/gcc-11.2.0.tar.gz
$ tar -xf gcc-11.2.0.tar.gz
This is weird because counting all the files is immediate with:
$ find . -type f | wc -l
CodePudding user response:
Size of all *.c and *.cpp files in bytes:
find . -name *.cpp -o -name *.c -exec wc -c {} \; | sed "s/ .*//" | paste -sd | bc
Number of lines in all *.c and *.cpp files:
find . -name *.cpp -o -name *.c -exec wc -l {} \; | sed "s/ .*//" | paste -sd | bc
Explanation: find ... -exec
executes a command on all files it finds, replacing the {}
in the -exec
part with the file name(s). If you end the -exec
part with \;
, it will be executed once for each file. In some cases, ending the -exec
part with
is more efficient -- this will execute for as many files as will fit in one command line.
wc -l
and wc -c
will print one line per file, with the number of lines / characters (bytes) followed by the file name. The `sed "s/ .*//"' will cut away everything after the number.
paste -sd
will then concatenate all the lines (-s
) and separate them with plus signs (-d
). Piping this to bc
makes bc
execute the addition and give you the total you are looking for.
Meta answer: Learn about find ... -exec
. Don't loop over find
output, because you will get into trouble e.g. when file names contain spaces -- and xargs
is in most cases unnecessary.
CodePudding user response:
Then I try to sum the *.c files, but the bash loop doesn't stop
You just waited not enough time. Bash is a very slow programming language. For every single loop, your program forks
a subshell and does fork exec
of stat
, cat
and wc
. And you are reading each file twice - once by stat
, than by cat
. That's a lot of processes and a lot of work and double the needed I/O.
Write the same code in AWK, Python or Perl or in C or in C , it will be much faster.
This is weird because counting all the files is immediate with:
The code you posted is counting the bytes of files names, not files content. To count bytes of files content, you would do find ... | xargs wc ...
- pass filenames as arguments to wc
.
Yes, a single pipeline of constant count of two processes will be a lot faster. Opening of files, counting bytes and lines, and the sum is calculated inside a C program wc
, not by Bash.
CodePudding user response:
Here is the TLDR answer based on @HolyBlackCat (see also here):
- install
cloc
withsudo apt install cloc
- from the top level
gcc
directory simply run:$ cloc .
- Then just look at the relevant info (
C/CPP
for me):
------------------------------------------------------------
Language files blank comment code
------------------------------------------------------------
C 45147 680546 889750 3465801
C 27752 230223 289596 1097375