Home > OS >  checking that a file is not truncated
checking that a file is not truncated

Time:05-30

I have downloaded many gz files from an ftp address :

http://ftp.ebi.ac.uk/pub/databases/spot/eQTL/sumstats/

How can I check that whether the files have been truncated during the download (i.e. wget did not download the entire file because of network connection) ? Thanks.

CodePudding user response:

As you can see in each directory you have file md5sum.txt. You can use command like:

md5sum -c md5sum.txt  

This will calculate the hashes and compare them with the values in the file.

CodePudding user response:

How can I check that whether the files have been truncated during the download (i.e. wget did not download the entire file because of network connection) ?

You might use spider mode to get just headers of response, for example

wget --spider http://ftp.ebi.ac.uk/pub/databases/spot/eQTL/sumstats/Alasoo_2018/exon/Alasoo_2018_exon_macrophage_naive.permuted.tsv.gz

gives output

Spider mode enabled. Check if remote file exists.
--2022-05-30 09:38:55--  http://ftp.ebi.ac.uk/pub/databases/spot/eQTL/sumstats/Alasoo_2018/exon/Alasoo_2018_exon_macrophage_naive.permuted.tsv.gz
Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.193.138
Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.193.138|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 645718 (631K) [application/octet-stream]
Remote file exists.

Length is size of file (in bytes) so after comparing it with your local file you will be able to tell if it is complete or not.

If you want to download missing parts if any, rather than merely check for completeness, then take look at -c option, from wget man page

-c

--continue

Continue getting a partially-downloaded file. This is useful when you want to finish up a download started by a previous instance of Wget, or by another program.(...)

  • Related