i have background download with wget, i want, when the file is greater than 20mb, truncate the first 10mb of file, i have created this script:
if [ $filesize -ge $maxSize ]; then
echo "Truncate.."
kill -STOP $pidDwn
fallocate -c -o 0 -l 10M $fileName
kill -CONT $pidDwn
fi
this snippet recognize file size and truncate it for 10MB from beginning of file.
I stop wget progess, use fallocate for delete first 10MB, and after i resume the wget process for continue the download.
The problem is strange, if the file size is 20mb and i use fallocate WITHOUT resume the process wget, the file remains 10mb, but if i resume the wget process the file return to 20mb instantly and continue to increase with download.
If i use this command after resume the pid
sed -i 1d $fileName
the file remains 10mb but not increase anymore with the download, seems like download is interrupted, but wget process still alive if i use ps aux
for see all process active
any idea for fix it?
CodePudding user response:
truncate the first 10mb of file
If server from which you are downloading said file support Partial Content gimmick then you might request Range (part) of file. First check if server support that feature do
wget --spider --server-response <url_to_resource>
if what was printed contain
Accept-Ranges: bytes
then it does and you might request file starting from, say 100th byte using Range
header that is
wget --header "Range: bytes=100-" <url_to_resource>
You should then get response with code 206 and download will start.
CodePudding user response:
fallocate
(or any other program) won't be able to modify the SEEK position that wget
is currently using for writing to the file, so you can't do it that way.
A possible work-around would be to use tail
for getting rid of all but the last 20MiB of the file:
wget 'https://somewhere/somefile.pdf' -O - |
tail -c "$((20 * 1024 * 1024))" > last20MiB.out
remark: Frankly, I don't know what you'll be able to do with a file whose header have been stripped.
Update
What you're trying to do isn't trivial because it requires a low-level file API, but here's a solution with perl
.
The perl
program writes the last 20 MiB of the stream to a file for every 10 MiB of downloaded data; the first write happens after 30 MiB of input data.
curl 'https://somewhere/somefile.pdf' |
perl -e '
use bytes;
$max_file_size = 20 * 1024 * 1024; #=> 20 MiB
$buffer = "";
$buffer_size = 0;
$pending_bytes = false;
while ($bytes = read(STDIN, $data, 1048576)) {
$buffer .= $data;
$buffer_size = $bytes;
$pending_bytes = true;
if ($buffer_size >= 1.5 * $max_file_size) {
$buffer = substr $buffer, ($buffer_size - $max_file_size), $max_file_size;
$buffer_size = $max_file_size;
open(FH, ">:raw", $ARGV[0]);
print FH $buffer;
close(FH);
$pending_bytes = false;
}
}
if ($pending_bytes) {
open(FH, ">:raw", $ARGV[0]);
print FH substr($buffer, ($buffer_size > $max_file_size ? $buffer_size - $max_file_size : 0), $max_file_size);
close(FH);
}
' last20MiB.out
remark: I tried to use truncate FH, 0
instead of opening/closing the output file for each write but it doesn't work...