When trying to use the sum command on a large number of rows, I get an incorrect answer. How do I pr-CodePudding

I have several files with a large number of rows and I'm interested in finding the sum of the numbers in the sixth column of this file.

Currently I use the following series of commands:

cat file.txt | cut -f 6 | sed "1d" | sum

And it outputs:

01667 4

Obviously I know this answer is wrong (when I copied everything into excel and used that sum function it gave me an answer of 21693) and I know it has something to do with calculation issues inherent in the language, but I'm not sure how to resolve this issue. There are a total of 1452 individual numbers that I am trying to sum from that one specific file (I would like to do it on several similar files that will have similar numbers of rows as well).

Can anyone assist in helping me figure out the issue to this problem?

CodePudding user response：

$: sum --help
Usage: sum [OPTION]... [FILE]...
Print checksum and block counts for each FILE.

sum isn't what you want.

Also, cat to cut to sed to something else is often an indicator that you're overdoing something. c.f. this link, and more importantly, this one.

Take a look at awk, which can handle all that in one call. (You might not have GNU awk.)

awk 'NR>1{tot =$6} END{print tot}' file.txt

That should handle it efficiently.

As an aside, if you are skipping row 1 because it's headers, and IF they are just strings, you can probably remove the test for NR on every row, as the string will (be careful) usually evaluate mathematically to zero. I don't like doing things that way, but there is a school of thought that says simpler is always better. YMMV.