Home > Mobile >  Can text be sorted twice?
Can text be sorted twice?

Time:11-26

I have an awk array that aggregates bytes up and downloaded. I can sort the output by either bytes down or up and pipe that to head for the top talkers; is it possible to output two sorts using different keys?

 zgrep ^1 20211014T00*.gz|awk '{print$3,$11,$6,$(NF-7)}'| awk 'NR>1{bytesDown[$1 " " $2] =$3;bytesUp[$1 " " $2] =$4} END {for(i in bytesDown) print bytesDown[i], bytesUp[i], i}'|sort -rn|head

Rather than parsing the source again to get the top uploads, I would like to be able to output the array again to "sort -rnk2|head".

I can see how I'd do it with a scratch file but is it possible/desirable to do it in memory? It's a bash shell on a 2 CPU Linux VM with 4GB of memory.

CodePudding user response:

Bash allows you to do that with process substitutions. It's not clear what you expect it to do with the data; printing both results to standard output is unlikely to be useful, so I send each to a separate file for later inspection.

zgrep ^1 20211014T00*.gz | 
awk '{print$3,$11,$6,$(NF-7)}' |
awk 'NR>1{bytesDown[$1 " " $2] =$3;bytesUp[$1 " " $2] =$4}
  END {for(i in bytesDown) print bytesDown[i], bytesUp[i], i}' |
tee >(sort -rn | head >first) |
sort -rnk2 | head >second

The double Awks could easily be refactored to a single Awk script. Something like this?

awk 'NR>1{bytesDown[$3 " " $11] =$6;bytesUp[$3 " " $11] =$(NF-7)}
    END { for(i in bytesDown) print bytesDown[i], bytesUp[i], i }'

CodePudding user response:

Your question isn't clear and there's no sample input/output to test with but this MAY be what you're trying to do:

zgrep '^1' 20211014T00*.gz|
awk '
    NR > 1 {
        key = $3 " " $11
        bytesdown[key]  = $6
        bytesup[key]  = $(NF-7)
    }
    END {
        cmd = "sort -rn | head"
        for ( key in bytesDown ) {
            print bytesDown[key], bytesUp[key], key | cmd
        }
        close(cmd)

        cmd = "sort -rnk2 | head"
        for ( key in bytesDown ) {
            print bytesDown[key], bytesUp[key], key | cmd
        }
        close(cmd)
    }
'

which could be written more concisely and efficiently as:

zgrep '^1' 20211014T00*.gz|
awk '
    NR > 1 {
        key = $3 " " $11
        bytesdown[key]  = $6
        bytesup[key]  = $(NF-7)
        if ( NR == 2 ) {
            max_bytesdown_key = key
            max_bytesup_key = key
        }
        else {
            if ( bytesdown[key] > bytesdown[max_bytesdown_key] ) {
                max_bytesdown_key = key
            }
            if ( bytesup[key] > bytesup[max_bytesup_key] ) {
                max_bytesup_key = key
            }
        }
    }
    END {
        print bytesdown[max_bytesdown_key], bytesup[max_bytesdown_key], max_bytesdown_key
        print bytesdown[max_bytesup_key], bytesup[max_bytesup_key], max_bytesup_key
    }
'
  • Related