Home > Blockchain >  Sort groups of 12 lines based on one value
Sort groups of 12 lines based on one value

Time:09-21

I'm trying to optimize a search for the highest ranking polynomials (https://maths-people.anu.edu.au/~brent/pd/Murphy-thesis.pdf) in a list containing 500k lines of data. The list is in groups of 12 lines, with each one in the following format:

n: 533439167600904850230361756102700151678687933392166847323827307497363839257031077774321424872955045754669625577486179222154434651598903112919949771321416511589029559325246084363632977829645558547714072241
Y0: -2185827644152440194843077528225522129878
Y1: 119181810251841490251547
c0: 520196368294236390929241313007470334962
c1: 96360506527052960901419060941213412645
c2: 43791634664623702231347384357
c3: -9285559657533242039560613517
c4: 563452403603161952
c5: -21637936320
skew: 137792.000
lognorm 67.52, exp_E 62.03, alpha -1.81 (proj -2.68), 3 real roots

n: 533439167600904850230361756102700151678687933392166847323827307497363839257031077774321424872955045754669625577486179222154434651598903112919949771321416511589029559325246084363632977829645558547714072241
Y0: -2185827643535814056463203098120423438934
Y1: 1185320029877707674463
c0: 2018231558989478149929124495499518870153
c1: 877408379299126273318698618329767851376
c2: -103500370253681428439107986294
c3: -8603519648746439934492486528
c4: 220583232537944759
c5: -12839506680
skew: 431744.000
lognorm 68.01, exp_E 62.61, alpha 0.09 (proj -1.93), 3 real roots

How would I be able to sort these based on the value of a given parameter? (either lognorm or exp_E)

CodePudding user response:

I don't think the sort command will do what you want without "help".
So,

  • combine all 12 lines into one superstring
  • precede string with two sort fields
  • sort as desired
  • convert back into original format

The following is not the most efficient script, but it should be fairly easy to understand

#  combine 12 lines into one super string
#  preceed each line with the two potential sort fields
gawk '
BEGIN{del="^"}
$0==""{next}  ## skip blank line
{all=all $0 del}  ## build up combo string
/lognorm/{
  L=$2
  E=$4
  sub(",","",L)
  sub(",","",L)
  print L,E,all  ## copy two potential sort fields to fron of the string
  all=""
}' $1 |
sort -n -k1,1 | ## or -k2,2  ### now we sort on desired field
gawk '{
  gsub(/[\^]/, "\n")           # replace ^ with newline
  sub(/^[^ ]* [^ ]* /, "")  # strip first two fields (we added above)
  print $0
}'
  • Related