Home > Net >  How to sort data based on date field by excluding header
How to sort data based on date field by excluding header

Time:09-17

I have a scenario where I have below data in file which need to be sorted based on date column , first line is the headers it should not be sorted

NAME|AGE|COURSE|DATES
v1|31|MC|12 JUL 2019
v2|33|MB|4  JUL 2019
v3|12|GG|13 JUL 2019
v4|21|JJ|7  JUL 2019

My code :

sort -n -k k4 /d/file.txt 

This above code does not sort my data

Expected Output :

NAME|AGE|COURSE|DATES
v4|21|JJ|7  JUL 2019
v2|33|MB|4  JUL 2019
v1|31|MC|12 JUL 2019
v3|12|GG|13 JUL 2019

CodePudding user response:

The way do to this is with Command Grouping where you can extract the header from an input stream, print it, and consume the remaining data:

{
    IFS= read -r header
    echo "$header"
    sort ...
} < file.txt

However, sorting dates with that format is tricky. Here's how you have to do it so the output is sorted chronologically. This assumes GNU sort:

$ cat file.txt          # I added a couple of extra records
NAME|AGE|COURSE|DATES
v1|31|MC|12 JUL 2019
v2|33|MB|4  JUL 2019
v3|12|GG|13 JUL 2019
v4|21|JJ|7  JUL 2019
11|22|33|1  JUL 2020
aa|bb|cc|10 AUG 2019

$ {
    IFS= read -r header
    echo "$header"
    sort -t'|' -n -s -k4 | sort -M -s -k 2,2 | sort -n -s -k 3,3
} < file.txt
NAME|AGE|COURSE|DATES
v2|33|MB|4  JUL 2019
v4|21|JJ|7  JUL 2019
v1|31|MC|12 JUL 2019
v3|12|GG|13 JUL 2019
aa|bb|cc|10 AUG 2019
11|22|33|1  JUL 2020

That uses the GNU sort "stable" option so you sort first by day, then by month, then by year.

CodePudding user response:

Borrowing @glennjackman's sample input, this will work using any versions of the mandatory Unix tools awk, sort, and cut:

$ awk '
    BEGIN { FS="|"; OFS="\t" }
    {
        split($NF,d," ")
        mthNr = (index("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",d[2]) 2)/3
        print (NR>1), d[3], mthNr, d[1], NR, $0
    }
' file.txt |
sort -k1,1n -k2,2n -k3,3n -k4,4n -k5,5n |
cut -f6-
NAME|AGE|COURSE|DATES
v2|33|MB|4  JUL 2019
v4|21|JJ|7  JUL 2019
v1|31|MC|12 JUL 2019
v3|12|GG|13 JUL 2019
aa|bb|cc|10 AUG 2019
11|22|33|1  JUL 2020
  • Related