I am trying to sort a file by a date field. I realize this has been done before, however, I cannot find an example that has the following date format.
Canada Goose 1x03 For the Triumph of Evil Sep/30/2013
Rucksack 10x03 Everybody's Crying Mercy Oct/03/13
Test 4x01 Season 4, Episode 1 Jun/01/14
New Family 3x03 Double Date Oct/01/2013
I tried this command but it doesn't work
sort -t ' ' -k 4.8,4.11 -k 4.4M -k 4.1,4.2 -b Test.txt
CodePudding user response:
If you have a GNU awk installed, you may want to try this approach.
sort.awk
#!/bin/gawk -f
function convertToSeconds(date, fields) {
split(date, fields, /\//)
fields[1]=months[tolower(fields[1])]
fields[2]=sprintf("d", fields[2])
fields[3]=(length(fields[3]) == 2) ? sprintf("2d", fields[3]) : fields[3]
return mktime(sprintf("%s %s %s 00 00 00", fields[3], fields[1], fields[2]))
}
BEGIN {
FS="( \\ )"
months["jan"]="01"; months["feb"]="02"; months["mar"]="03"; months["apr"]="04"
months["may"]="05"; months["jun"]="06"; months["jul"]="07"; months["aug"]="08"
months["sep"]="09"; months["oct"]="10"; months["nov"]="11"; months["dec"]="12"
}
{
arr[convertToSeconds($4)]=$0
}
END {
asorti(arr, dst)
for(i=1; i<=FNR; i) {
print arr[dst[i]]
}
}
Give it an execute permission, then run it:
$ chmod x ./sort.awk
$ ./sort.awk Test.txt
To save the changes into a new file, append this >
operator.
$ ./sort.awk Test.txt > SortedTest.txt
CodePudding user response:
** UPDATE 1 **
revised sort key to explicitly list 4 digit year as prefix to circumvent year-end crossover issues
since OP only wants to sort date field, the exact epochs mapping isn't needed at all ::
mawk '$ NF = 366 * ( (_=($3) % 100) 1900 100 * (_<50) ) \ int(_ * 10^8) ($2) (31) * \ (index(" JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC", toupper($2)) / 3 - 1)'
23284 SEP 30 2013 201300737036
23285 OCT 1 2013 201300737038
23287 OCT 3 2013 201300737040
23541 JUN 14 2014 201400737293
1st column is original date generation order (the correct rank ordering), and the last column is the calculated sort index value - i tested every date from jan 1st 1950
to dec 31 2025
, and this simplistic approach ranks order just fine, even though it doesn't bother to calculate exact julian dates, or exact leap years,
since the objective is merely finding a rank ordering method that yields the same sorting output as exact epoch seconds