I'm trying to optimize a search for the highest ranking polynomials (https://maths-people.anu.edu.au/~brent/pd/Murphy-thesis.pdf) in a list containing 500k lines of data. The list is in groups of 12 lines, with each one in the following format:
n: 533439167600904850230361756102700151678687933392166847323827307497363839257031077774321424872955045754669625577486179222154434651598903112919949771321416511589029559325246084363632977829645558547714072241
Y0: -2185827644152440194843077528225522129878
Y1: 119181810251841490251547
c0: 520196368294236390929241313007470334962
c1: 96360506527052960901419060941213412645
c2: 43791634664623702231347384357
c3: -9285559657533242039560613517
c4: 563452403603161952
c5: -21637936320
skew: 137792.000
lognorm 67.52, exp_E 62.03, alpha -1.81 (proj -2.68), 3 real roots
n: 533439167600904850230361756102700151678687933392166847323827307497363839257031077774321424872955045754669625577486179222154434651598903112919949771321416511589029559325246084363632977829645558547714072241
Y0: -2185827643535814056463203098120423438934
Y1: 1185320029877707674463
c0: 2018231558989478149929124495499518870153
c1: 877408379299126273318698618329767851376
c2: -103500370253681428439107986294
c3: -8603519648746439934492486528
c4: 220583232537944759
c5: -12839506680
skew: 431744.000
lognorm 68.01, exp_E 62.61, alpha 0.09 (proj -1.93), 3 real roots
How would I be able to sort these based on the value of a given parameter? (either lognorm or exp_E)
CodePudding user response:
I don't think the sort command will do what you want without "help".
So,
- combine all 12 lines into one superstring
- precede string with two sort fields
- sort as desired
- convert back into original format
The following is not the most efficient script, but it should be fairly easy to understand
# combine 12 lines into one super string
# preceed each line with the two potential sort fields
gawk '
BEGIN{del="^"}
$0==""{next} ## skip blank line
{all=all $0 del} ## build up combo string
/lognorm/{
L=$2
E=$4
sub(",","",L)
sub(",","",L)
print L,E,all ## copy two potential sort fields to fron of the string
all=""
}' $1 |
sort -n -k1,1 | ## or -k2,2 ### now we sort on desired field
gawk '{
gsub(/[\^]/, "\n") # replace ^ with newline
sub(/^[^ ]* [^ ]* /, "") # strip first two fields (we added above)
print $0
}'