Home > Net >  I want to date sort the 5th col using sort command.But the problem is no consistency in format,what
I want to date sort the 5th col using sort command.But the problem is no consistency in format,what

Time:09-26

M_ID,M_NAME,DEPT_ID,START_DATE,END_DATE,Salary 
M001,Richa,D001,27-Jan-07,27-Feb-07,150000 
M002,Nitin,D002,16-Feb-07,16-May-07,40000 
M003,AJIT,D003,8-Mar-07,8-Sep-07,70000 
M004,SHARVARI,D004,28-Mar-07,28-Mar-08,120000 
M005,ADITYA,D002,27-Apr-07,27-Jul-07,40000 
M006,Rohan,D004,12-Apr-07,12-Apr-08,130000 
M007,Usha,D003,17-Apr-07,17-Oct-07,70000 
M008,Anjali,D002,2-Apr-07,2-Jul-07,40000 
M009,Yash,D006,11-Apr-07,11-Jul-07,85000 
M010,Nalini,D007,15-Apr-07,15-Oct-07,9999

 tail -10 Joining_date.txt|awk -F\, '{print $1,$2,$3,$4,$5|("sort -t, -M");$6} '

Expected output

M001,Richa,D001,27-Jan-07,27-Feb-07,150000
M002,Nitin,D002,16-Feb-07,16-May-07,40000
M008,anjali,D002,2-Apr-07,2-Jul-07,40000
M009,Yash,D006,11-Apr-07,11-Jul-07,85000
M005,ADITYA,D002,27-Apr-07,27-Jul-07,40000
M003,AJIT,D003,8-Mar-07,8-Sep-07,80000
M010,Nalini,D007,15-Apr-07,15-Oct-07,9999
M007,Usha,D003,17-Apr-07,17-Oct-07,70000
M004,SHARVARI,D004,28-Mar-07,28-Mar-08,120000
M006,Rohan,D004,12-Apr-07,12-Apr-08,130000

CodePudding user response:

Like this:

tail -n 2 Joining_date.txt | sed -E 's/^(([^,] ,){4})([0-9]-)/\10\3/' | LC_ALL=C sort -t ',' -k 5.8n -k 5.4M -k 5.1n
  • tail -n 2 -- Get from 2nd line to the end.
  • sed -E 's/^(([^,] ,){4})([0-9]-)/\10\3/ -- Prepend a zero to the day of 5th field into two-digit day.
  • sort -t ',' -k 5.8n -k 5.4M -k 5.1n
    • -t ',' -- Set a field delimeter (,) for sort
    • -k 5.8n -- Sort by numeric years first.
    • -k 5.4M -- Sort by months next.
    • -k 5.1n -- Sort by numeric days.

CodePudding user response:

Applying the DSU idiom using any awk sort cut:

$ awk '
    BEGIN { FS=","; OFS="\t" }
    {
        split($5,d)
        mth = (index("JanFebMarAprMayJunJulAugSepOctNovDec",d[2]) 2)/3
        datec=csprintf("ddd", d[3], mth, d[1])
        print (NR>1), date, NR, $0
    }
' file |
sort -n -k1,2 -k3,3 |
cut -f4-
M_ID,M_NAME,DEPT_ID,START_DATE,END_DATE,Salary
M001,Richa,D001,27-Jan-07,27-Feb-07,150000
M002,Nitin,D002,16-Feb-07,16-May-07,40000
M003,AJIT,D003,8-Mar-07,8-Sep-07,70000
M004,SHARVARI,D004,28-Mar-07,28-Mar-08,120000
M005,ADITYA,D002,27-Apr-07,27-Jul-07,40000
M006,Rohan,D004,12-Apr-07,12-Apr-08,130000
M007,Usha,D003,17-Apr-07,17-Oct-07,70000
M008,Anjali,D002,2-Apr-07,2-Jul-07,40000
M009,Yash,D006,11-Apr-07,11-Jul-07,85000
M010,Nalini,D007,15-Apr-07,15-Oct-07,9999

The above ensures the header line will be printed first even if it has a number in the 5th field, and all lines with the same date in the 5th field will be printed in the original input order. Just change the 2nd { to NR>1 { if you really don't want the header line printed.

  • Related