I need to create a text file that includes just the dot symbol "." on every line, repeatedly, until a specific number of lines stored in a variable, is reached. I'm now using a while loop, but those files with dots need to be around 0.5-5 million lines. Therefore, it takes a bit longer than I would like it to. Below is my current code:
j=0
while [[ $j != $length ]]
do
echo "." >> $file
((j ))
done
So my question is: Is there a more efficient way of creating a file with x number of lines that each contain the same character (or string) repeating, other than using a while loop?
Thanks,
CodePudding user response:
You can use yes
and head
:
yes . | head -n "$length" > "$file"
This should be dramatically much faster than repeatedly opening and closing the file to write two bytes at a time.
CodePudding user response:
Using dd
to write to the output file (took less than 2 secs)
time yes . | dd of=dotbig.txt count=1024 bs=1048576 iflag=fullblock
1024 0 records in
1024 0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.76116 s, 610 MB/s
real 0m1.814s
user 0m0.076s
sys 0m0.686s
Count of lines
wc -l dotbig.txt
536870912 dotbig.txt
Contents sample:
head -n 3 dotbig.txt ; tail -n 3 dotbig.txt
.
.
.
.
.
.
CodePudding user response:
The most resource intensive piece of this code is the redirection (echo '.' > $file
). To get around this you will want to "build" a string and redirect to $file
only once rather then $length
times.
j=0
while [[ $j != $length ]]
do
builder=${builder}.
done
echo "$builder" > $file
However you are still in a loop which probably isn't the best use of resources. To get around this lets take inspiration from this answer:
printf '.\n%.0s' $(seq $length) > $file
Note that here we use $(seq $length)
rather than {1..$length}
since bash does not expand {1..$length}
to 0 1 2 3 4 5 6 7 8 9 10
if length is 10 (see this question)
CodePudding user response:
Repetive actions in bash
(eg, via a loop) are always going to be slow if simply due to the overhead of spinning up a new OS process (for each pass through the loop) for each command within the loop. In this case there's going to be an additional overhead for opening and closing the output file on each pass through the loop.
You want to look for a solution that limits the number of OS processes that need to be created/closed (and in this case limit the number of times you open/close the output file). There are going to be a lot of options depending on what software/tool/binary you want to use.
One awk
idea:
awk -v len="${length}" 'BEGIN {for (i=1;i<=len;i ) print "."}' > newfile
While this does use a 'loop' within awk
, we're only looking at a single OS process at the bash
level, and we're only opening/closing the output file once.
CodePudding user response:
This should double the size of the file each time. Maybe it's more efficient than some of the other solutions, maybe not. File "b" will keep doubling in size until a doubling would take it over the size of length. When length is a power of 2, I think this would be pretty efficient.
let n=2
let length=1000000
echo '.' > a
cat a a > b
rm a
while [[ $((n*2)) -le $length ]]; do
mv b a
cat a a > b
rm a
let n=n*2
done
# do something here to fill out the remaining length-n lines