Home > Mobile >  Sorting bash script bug
Sorting bash script bug

Time:04-13

I'm trying to sort certificates by date leaving only the latest one on a separate file.

here's an example pki_certs.res input file for an example host with the list of its past certificates unsorted which I need to sort:

And here's the script to sort and pop the last one out:

cat "${_file}" | sort -k10,10 | sed -e 's/Not After : //' -e 's/GMT/GMT;/' | grep "${_domain}" | \
  while read line; do
    _first=`echo $line | cut -d';' -f1`
    _second=`echo $line | cut -d';' -f2-`
    _date=`date -d "${_first}"  %Y%m%d%H%M`
    echo "$_date $_second"
  done |sort -k 3,3 -k 1,1r | awk "{if (i[\$3] < \$1) i[\$3]=\$1} END{for(x in i){ print x\" \"i[x] }}" | \
  sed -e 's/CN=//g' | sort -k 2,2 > pki_certs.final.sorted

Trouble is the sorting is leaving the last before the recent in the pki_certs.final.sorted file.

Expected output:

Apr 7 20:09:26 2023

but rather than have that an output I get this instead:

Apr 12 18:12:02 2022

Any thoughts on what am I missing please

CodePudding user response:

Apply the DSU (Decorate/Sort/Undecorate) idiom using any version of the mandatory Unix tools awk, sort, cut, and head to get the whole line output:

$ awk '{printf "ddd%s\t%s\n", $7, (index("JanFebMarAprMayJunJulAugSepOctNovDec",$4) 2)/3, $5, $6, $0}' file |
    sort -r | head -1 | cut -f2-
pki_certs.res:Not After : Apr 7 20:09:26 2023 GMT DNS:MDVARTREPO01.cpp.nonlive

or just the date:

$ awk '{printf "ddd%s\t%s\n", $7, (index("JanFebMarAprMayJunJulAugSepOctNovDec",$4) 2)/3, $5, $6, $4" "$5" "$6" "$7}' file |
    sort -r | head -1 | cut -f2-
Apr 7 20:09:26 2023

The first awk adds a sortable version of the date time at the front of each line, then sort sorts it by that timestamp, then cut removes the string that awk added. Seeing the intermediate output from each step shows how it works:

$ awk '{printf "ddd%s\t%s\n", $7, (index("JanFebMarAprMayJunJulAugSepOctNovDec",$4) 2)/3, $5, $6, $4" "$5" "$6" "$7}' file
2023040720:09:26        Apr 7 20:09:26 2023
2020050712:05:44        May 7 12:05:44 2020
2021040817:06:54        Apr 8 17:06:54 2021
2020050711:58:19        May 7 11:58:19 2020
2021040917:42:27        Apr 9 17:42:27 2021
2021041709:09:35        Apr 17 09:09:35 2021
2021040917:02:43        Apr 9 17:02:43 2021
2022041218:12:02        Apr 12 18:12:02 2022

$ awk '{printf "ddd%s\t%s\n", $7, (index("JanFebMarAprMayJunJulAugSepOctNovDec",$4) 2)/3, $5, $6, $4" "$5" "$6" "$7}' file | sort -r
2023040720:09:26        Apr 7 20:09:26 2023
2022041218:12:02        Apr 12 18:12:02 2022
2021041709:09:35        Apr 17 09:09:35 2021
2021040917:42:27        Apr 9 17:42:27 2021
2021040917:02:43        Apr 9 17:02:43 2021
2021040817:06:54        Apr 8 17:06:54 2021
2020050712:05:44        May 7 12:05:44 2020
2020050711:58:19        May 7 11:58:19 2020

$ awk '{printf "ddd%s\t%s\n", $7, (index("JanFebMarAprMayJunJulAugSepOctNovDec",$4) 2)/3, $5, $6, $4" "$5" "$6" "$7}' file | sort -r | head -1
2023040720:09:26        Apr 7 20:09:26 2023

$ awk '{printf "ddd%s\t%s\n", $7, (index("JanFebMarAprMayJunJulAugSepOctNovDec",$4) 2)/3, $5, $6, $4" "$5" "$6" "$7}' file | sort -r | head -1 | cut -f2-
Apr 7 20:09:26 2023

By the way in your code where you have:

awk "{if (i[\$3] < \$1) i[\$3]=\$1} END{for(x in i){ print x\" \"i[x] }}"

you're having to escape all those symbols because you're using the wrong quotes and so inviting the shell to interpret the script before awk sees it. Just don't do that, use the single quotes as you always should unless you have a specific reason not to:

awk '{if (i[$3] < $1) i[$3]=$1} END{for(x in i){ print x" "i[x] }}'

CodePudding user response:

It's not clear (to me) OP's final objective so fwiw ...

Assumptions/Understandings:

  • objective #1 - print the latest/newest date
  • objective #2 - print the line containing the latest/newest date
  • objective #3 - sort the entire input file based on datetime stamps
  • all cert details reside on a single line (ie, an individual cert's details do not span multiple lines)
  • each line contains a datetime stamp of the form 'mmm dd HH:MM:SS yyyy'

One idea using GNU awk:

awk '   # define field pattern as " mmm dd HH:MM:SS yyyy "
BEGIN { FPAT=" [[:alpha:]]{3} [0-9]{1,2} [0-2][0-9]:[0-5][0-9]:[0-5][0-9] [0-9]{4} "

        # build array of months to allow converting from 3-character to numeric
        n=split("Jan:Feb:Mar:Apr:May:Jun:Jul:Aug:Sep:Nov:Dec",arr,":")
        for (i=1;i<=n;i  )
            month[(arr[i])]=i
      }

      { n=split($1,arr)                        # split 1st (and only?) FPAT matching field on white space
        gsub(/:/," ",arr[3])                   # convert ":" to " "

        # convert current datetime stamp to epoch seconds
        epoch=mktime(arr[4] " " month[arr[1]] " " arr[2] " " arr[3])

        if (epoch > maxepoch  0) {
           maxepoch=epoch
           dt=$1                               # save current "max" datetime stamp
        }
        lines[epoch]=$0                        # save current line, indexed by epoch
      }

END   { print dt                               # objective #1: print max datetime stamp
        print lines[maxepoch]                  # objective #2: print cert line containing max datetime stamp

        PROCINFO["sorted_in"]="@ind_num_asc"   # sort array by numeric index in ascending order
        for (i in lines)                       # objective #3: print lines[] array in epoch ascending order
            print lines[i] > "pki_certs.final.sorted"
      }
' cert.dat

NOTES:

  • requires GNU awk for a) FPAT, b) mktime() function and c) PROCinfo["sorted_in"] sorting directive
  • replaces all of OP's current code
  • OP can modify the END {...} block based on expected results

This generates:

# objective #1:

 Apr 7 20:09:26 2023

# objective #2:

pki_certs.res:Not After : Apr 7 20:09:26 2023 GMT DNS:MDVARTREPO01.cpp.nonlive

# objective #3:

$ cat pki_certs.final.sorted
pki_certs.res:Not After : May 7 11:58:19 2020 GMT Subject: CN=MDVARTREPO01.cpp.nonlive DNS:MDVARTREPO01.cpp.nonlive
pki_certs.res:Not After : May 7 12:05:44 2020 GMT Subject: CN=MDVARTREPO01.cpp.nonlive DNS:MDVARTREPO01.cpp.nonlive
pki_certs.res:Not After : Apr 8 17:06:54 2021 GMT Subject: CN=MDVARTREPO01.cpp.nonlive DNS:MDVARTREPO01.cpp.nonlive
pki_certs.res:Not After : Apr 9 17:02:43 2021 GMT Subject: CN=MDVARTREPO01.cpp.nonlive DNS:MDVARTREPO01.cpp.nonlive
pki_certs.res:Not After : Apr 9 17:42:27 2021 GMT Subject: CN=MDVARTREPO01.cpp.nonlive DNS:MDVARTREPO01.cpp.nonlive
pki_certs.res:Not After : Apr 17 09:09:35 2021 GMT Subject: CN=MDVARTREPO01.cpp.nonlive DNS:MDVARTREPO01.cpp.nonlive
pki_certs.res:Not After : Apr 12 18:12:02 2022 GMT Subject: CN=MDVARTREPO01.cpp.nonlive DNS:MDVARTREPO01.cpp.nonlive
pki_certs.res:Not After : Apr 7 20:09:26 2023 GMT DNS:MDVARTREPO01.cpp.nonlive
  • Related