How to sort DD-MM-YYYY dates linux-CodePudding

I have dates in a DD/MM/YYYY format, with the caveat that for days <10 dates are single digits.

Some example data is below:

name,date,time
DEF,1/02/2021,06:00
HIJ,31/01/2021,07:50
ABC,1/04/2021,05:50

I only want to sort by the date column. I find this challenging since the days part of the date value is variable in length.

CodePudding user response：

There isn't an easy way to sort the date field ([d]d/mm/yyyy) directly within a comma-separated field. However, picking out that date field and reordering the fields as yyyy-mm-dd makes it much easier. It's known as a Schwartzian transform.

Here you would sort the result, then throw away this first (temporary) field to regain the original data.

awk -F'[,/]' '{printf "d-d-d\t%s\n", $4,$3,$2,$0}' datafile | sort | cut -f2-

name,date,time
HIJ,31/01/2021,07:50
ABC,1/02/2021,05:50
DEF,1/02/2021,06:00

You can see the result of the intermediate process if you strip back either before or after the sort command.

CodePudding user response：

Split your data following the separators "," and "/" (using awk), you will end up with five columns:

DEF  1 02 2021 06:00
HIJ 31 01 2021 07:50
ABC  1 04 2021 05:50

Then, sort according to columns four, three and two numerically (sort -k4,3,2 -n or something similar).

You might decide to leave the data as they have been transformed (you'll up with this result):

HIJ 31 01 2021 07:50
DEF  1 02 2021 06:00
ABC  1 04 2021 05:50

... or you might want to put it back into the original formatting:

HIJ, 31/01/2021 07:50
DEF,  1/02/2021 06:00
ABC,  1/04/2021 05:50

CodePudding user response：

Using GNU awk, you can make use of it's internal sorting functions:

awk 'BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"; FS=","}
     {split($2,b,"/"); a[b[3]*10000   b[2]*100   b[1]]=$0}
     END{for(i in a) print a[i]}' file

More details here