Is there a more efficient way of creating a file with repetitive text on each line than using a whil-CodePudding

I need to create a text file that includes just the dot symbol "." on every line, repeatedly, until a specific number of lines stored in a variable, is reached. I'm now using a while loop, but those files with dots need to be around 0.5-5 million lines. Therefore, it takes a bit longer than I would like it to. Below is my current code:

j=0
while [[ $j != $length ]] 
do
  echo "." >> $file
  ((j  ))
done

So my question is: Is there a more efficient way of creating a file with x number of lines that each contain the same character (or string) repeating, other than using a while loop?

Thanks,

CodePudding user response：

You can use yes and head:

yes . | head -n "$length" > "$file"

This should be dramatically much faster than repeatedly opening and closing the file to write two bytes at a time.

CodePudding user response：

Using dd to write to the output file (took less than 2 secs)

time yes . | dd of=dotbig.txt count=1024 bs=1048576 iflag=fullblock
1024 0 records in
1024 0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.76116 s, 610 MB/s

real    0m1.814s
user    0m0.076s
sys     0m0.686s

Count of lines

wc -l dotbig.txt
536870912 dotbig.txt

Contents sample:

head -n 3 dotbig.txt ; tail -n 3 dotbig.txt
.
.
.
.
.
.

CodePudding user response：

The most resource intensive piece of this code is the redirection (echo '.' > $file). To get around this you will want to "build" a string and redirect to $file only once rather then $length times.

j=0
while [[ $j != $length ]]
do
    builder=${builder}.
done
echo "$builder" > $file

However you are still in a loop which probably isn't the best use of resources. To get around this lets take inspiration from this answer:

printf '.\n%.0s' $(seq $length) > $file

Note that here we use $(seq $length) rather than {1..$length} since bash does not expand {1..$length} to 0 1 2 3 4 5 6 7 8 9 10 if length is 10 (see this question)

CodePudding user response：

Repetive actions in bash (eg, via a loop) are always going to be slow if simply due to the overhead of spinning up a new OS process (for each pass through the loop) for each command within the loop. In this case there's going to be an additional overhead for opening and closing the output file on each pass through the loop.

You want to look for a solution that limits the number of OS processes that need to be created/closed (and in this case limit the number of times you open/close the output file). There are going to be a lot of options depending on what software/tool/binary you want to use.

One awk idea:

awk -v len="${length}" 'BEGIN {for (i=1;i<=len;i  ) print "."}' > newfile

While this does use a 'loop' within awk, we're only looking at a single OS process at the bash level, and we're only opening/closing the output file once.

CodePudding user response：

This should double the size of the file each time. Maybe it's more efficient than some of the other solutions, maybe not. File "b" will keep doubling in size until a doubling would take it over the size of length. When length is a power of 2, I think this would be pretty efficient.

let n=2
let length=1000000
echo '.' > a
cat a a > b
rm a
while [[ $((n*2)) -le $length ]]; do
  mv b a
  cat a a > b
  rm a 
  let n=n*2
done
# do something here to fill out the remaining length-n lines