Home > Mobile >  Sort file by different format of date field
Sort file by different format of date field

Time:09-23

I am trying to sort a file by a date field. I realize this has been done before, however, I cannot find an example that has the following date format.

Canada Goose   1x03   For the Triumph of Evil   Sep/30/2013
Rucksack   10x03   Everybody's Crying Mercy   Oct/03/13
Test   4x01   Season 4, Episode 1   Jun/01/14
New Family   3x03   Double Date   Oct/01/2013

I tried this command but it doesn't work

sort -t ' ' -k 4.8,4.11 -k 4.4M -k 4.1,4.2 -b Test.txt

CodePudding user response:

If you have a GNU awk installed, you may want to try this approach.

sort.awk

#!/bin/gawk -f
function convertToSeconds(date, fields) {
    split(date, fields, /\//)
    fields[1]=months[tolower(fields[1])]
    fields[2]=sprintf("d", fields[2])
    fields[3]=(length(fields[3]) == 2) ?  sprintf("2d", fields[3]) : fields[3]
    return mktime(sprintf("%s %s %s 00 00 00", fields[3], fields[1], fields[2]))
}
BEGIN {
    FS="( \\  )"
    months["jan"]="01"; months["feb"]="02"; months["mar"]="03"; months["apr"]="04"
    months["may"]="05"; months["jun"]="06"; months["jul"]="07"; months["aug"]="08"
    months["sep"]="09"; months["oct"]="10"; months["nov"]="11"; months["dec"]="12"
}
{
    arr[convertToSeconds($4)]=$0
}
END {
    asorti(arr, dst)
    for(i=1; i<=FNR;   i) {
        print arr[dst[i]]
    }
}

Give it an execute permission, then run it:

$ chmod  x ./sort.awk
$ ./sort.awk Test.txt

To save the changes into a new file, append this > operator.

$ ./sort.awk Test.txt > SortedTest.txt

CodePudding user response:

** UPDATE 1 **

revised sort key to explicitly list 4 digit year as prefix to circumvent year-end crossover issues

since OP only wants to sort date field, the exact epochs mapping isn't needed at all ::

mawk '$  NF = 366 * ( (_=($3) % 100)   1900   100 * (_<50) ) \

                   int(_ * 10^8)   ($2)   (31) * \
                   
(index("  JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC", toupper($2)) / 3 - 1)'
23284 SEP 30 2013 201300737036
23285 OCT  1 2013 201300737038
23287 OCT  3 2013 201300737040
23541 JUN 14 2014 201400737293

1st column is original date generation order (the correct rank ordering), and the last column is the calculated sort index value - i tested every date from jan 1st 1950 to dec 31 2025, and this simplistic approach ranks order just fine, even though it doesn't bother to calculate exact julian dates, or exact leap years,

since the objective is merely finding a rank ordering method that yields the same sorting output as exact epoch seconds

  • Related