I have dates in a DD/MM/YYYY format, with the caveat that for days <10 dates are single digits.
Some example data is below:
name,date,time
DEF,1/02/2021,06:00
HIJ,31/01/2021,07:50
ABC,1/04/2021,05:50
I only want to sort by the date column. I find this challenging since the days part of the date value is variable in length.
CodePudding user response:
There isn't an easy way to sort the date field ([d]d/mm/yyyy
) directly within a comma-separated field. However, picking out that date field and reordering the fields as yyyy-mm-dd
makes it much easier. It's known as a Schwartzian transform.
Here you would sort the result, then throw away this first (temporary) field to regain the original data.
awk -F'[,/]' '{printf "d-d-d\t%s\n", $4,$3,$2,$0}' datafile | sort | cut -f2-
name,date,time
HIJ,31/01/2021,07:50
ABC,1/02/2021,05:50
DEF,1/02/2021,06:00
You can see the result of the intermediate process if you strip back either before or after the sort
command.
CodePudding user response:
Split your data following the separators "," and "/" (using awk
), you will end up with five columns:
DEF 1 02 2021 06:00
HIJ 31 01 2021 07:50
ABC 1 04 2021 05:50
Then, sort according to columns four, three and two numerically (sort -k4,3,2 -n
or something similar).
You might decide to leave the data as they have been transformed (you'll up with this result):
HIJ 31 01 2021 07:50
DEF 1 02 2021 06:00
ABC 1 04 2021 05:50
... or you might want to put it back into the original formatting:
HIJ, 31/01/2021 07:50
DEF, 1/02/2021 06:00
ABC, 1/04/2021 05:50
CodePudding user response:
Using GNU awk, you can make use of it's internal sorting functions:
awk 'BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"; FS=","}
{split($2,b,"/"); a[b[3]*10000 b[2]*100 b[1]]=$0}
END{for(i in a) print a[i]}' file
More details here