How to sort data based on the value of a column for part (multiple lines) of a file?-CodePudding

My data in the file file1 look like

Each block has the same number of rows (Here it is 3 2 = 5). In each block, the first two lines are header, the next 3 rows have two columns, the first column is the label, which is one of the number from 1 to 3. I want to sort the rows in each block, based on the value of the first column (except the first two rows). So the expected result is

I think sort -k 1 -n file1 will be good for the total file. It gives me the wrong result:

This is not the expected result.

How to sort each block is still a problem for me. I think AWK is possible to perform this problem. Please give some suggestions.

CodePudding user response：

Apply the DSU (Decorate/Sort/Undecorate) idiom using any awk sort cut:

$ awk -v OFS='\t' '
    NF<pNF || NR==1 { blockNr   }
    { print blockNr, NF, NR, (NF>1 ? $1 : NR), $0; pNF=NF }
' file |
sort -n -k1,1 -k2,2 -k4,4 -k3,3 |
cut -f5-
3
0
1 0.8
2 0.5
3 0.2
3
1
1 0.4
2 0.1
3 0.8
3
2
1 0.8
2 0.4
3 0.3

To understand what that's doing, just look at the first 2 steps:

$ awk -v OFS='\t' 'NF<pNF || NR==1{ blockNr   } { print blockNr, NF, NR, (NF>1 ? $1 : NR), $0; pNF=NF }' file
1       1       1       1       3
1       1       2       2       0
1       2       3       2       2 0.5
1       2       4       1       1 0.8
1       2       5       3       3 0.2
2       1       6       6       3
2       1       7       7       1
2       2       8       2       2 0.1
2       2       9       3       3 0.8
2       2       10      1       1 0.4
3       1       11      11      3
3       1       12      12      2
3       2       13      1       1 0.8
3       2       14      2       2 0.4
3       2       15      3       3 0.3

$ awk -v OFS='\t' 'NF<pNF || NR==1{ blockNr   } { print blockNr, NF, NR, (NF>1 ? $1 : NR), $0; pNF=NF }' file |
    sort -n -k1,1 -k2,2 -k4,4 -k3,3
1       1       1       1       3
1       1       2       2       0
1       2       4       1       1 0.8
1       2       3       2       2 0.5
1       2       5       3       3 0.2
2       1       6       6       3
2       1       7       7       1
2       2       10      1       1 0.4
2       2       8       2       2 0.1
2       2       9       3       3 0.8
3       1       11      11      3
3       1       12      12      2
3       2       13      1       1 0.8
3       2       14      2       2 0.4
3       2       15      3       3 0.3

and notice that the awk command is just creating the key values that you need for sort to sort on by block number, line number or $1, etc. So awk Decorates the input, sort Sorts it, and cut Undecorates it by removing the decoration values that the awk script added.

CodePudding user response：

You can use sort and arrays in gawk

awk 'NF==1 && a[1]{
        n=asort(a); 
        for(k=1; k<=n; k  ){print a[k]}; 
        delete a; i=1
    }NF==1{print}
    NF==2{a[i]=$0;  i}
    END{n=asort(a); for(k=1; k<=n; k  ){print a[k]}}
' file1

you get