Home > front end >  How do I add an integer to each element of an array bash?
How do I add an integer to each element of an array bash?

Time:11-09

I would like to add 1 to each element of a bash array. Say I have an array:

array1=(5 1 7 9 4)

What I would like is to do ideally is this:

echo ${array2[@]}

5 6 1 2 7 8 9 10 4 5

So that each number is followed by its consecutive number. But I wasn't sure if this was possible so failing that I wanted to just generate the array:

echo ${array3[@]}

6 4 2 8 10 5

I can achieve the former with a for loop:

for i in ${array1[@]}; do array2 =($i `expr $i   1`); done

But this is very time-consuming (my array has 20 million elements) so I was just wondering whether there was a more direct way of doing this?

CodePudding user response:

While sticking to pure bash, the obvious performance enhancement is to drop the subshell invoking the external command expr:

for i in "${array1[@]}"; do array2 =("$i" "$(( i   1 ))"); done

Every subshell requires fork()ing a new process. Every invocation of expr, or any other command that isn't built into the shell, requires execve()'ing a separate executable. Both these things are slow enough that they should never be done in a performance-sensitive tight loop.

That said, I second the proposal that bash is a poorly-chosen language for handling the volume of data you're trying to process. Rust, Julia, Go, awk... there are lots of other choices.

CodePudding user response:

For handling 20 million elements, you can't manipulate too much. As someone else said, shell doesn't seem the best tool because interpreting is slow, but also because doubtful it would allow that many elements into a single array. would be much faster.

The solution proposed below is an attempt to meet your unexpressed needs, processing each item from an input stream, individually and independently, and pushing that result to output immediately, not creating a memory-resident monster unless you absolutely want to keep it and process that subsequently for logic in the AWK "END" segment.

#!/bin/sh

BASE=`basename "$0" ".sh" `
SAMPLE_INPUT_STREAM="${BASE}.input"

cat >"${SAMPLE_INPUT_STREAM}" <<-!EnDoFiNpUt
5
1
7
9
4
!EnDoFiNpUt

awk 'BEGIN{
    split( "", shiftPairing ) ;
    pos=0 ;
}
{
    pos   ;

#   ### Memory-resident approach
#   shiftPairing[pos,1]=$1 ;
#   shiftPairing[pos,2]=$1 1 ;
#   printf("[%d|%s|%s]\n", pos, shiftPairing[pos,1], shiftPairing[pos,2] ) ;

    ### Memory-avoiding approach
    printf("[%d|%s|%s]\n", pos, $1, $1 1 ) ;

#}END{
#   print "\nPost-Processing ..." ;
#   for( i=1 ; i<= pos ; i   ){
#       printf("[%d|%s|%s]\n", pos, shiftPairing[pos,1], shiftPairing[pos,2] ) ;
#   } ;
}' "${SAMPLE_INPUT_STREAM}"

The session output would look like this:

ericthered@OasisMega1:/0__WORK$ ./test_44.sh
[1|5|6]
[2|1|2]
[3|7|8]
[4|9|10]
[5|4|5]
ericthered@OasisMega1:/0__WORK$

OR ... for the memory-intensive logic using post-processing:

ericthered@OasisMega1:/0__WORK$ ./test_44.sh
[1|5|6]
[2|1|2]
[3|7|8]
[4|9|10]
[5|4|5]

Post-Processing ...
[1|5|6]
[2|1|2]
[3|7|8]
[4|9|10]
[5|4|5]
ericthered@OasisMega1:/0__WORK$

That reporting keeps the genomic sequencing number aligned with the two strand identifiers adjacent to each other, in case there is need for positional searching in the resulting output. That could be redirected where you want it. In this example, I've used square brackets, but for my own personal usage, I always use vertical bars as a single identifiable delimiter. That can be modified in the printf statement.

  • Related