Home > Software engineering >  parse ics and create output
parse ics and create output

Time:10-19

This is an ics file containing three test events. Using bash, I want to create a table containing just the DTSTART and SUMMARY.

BEGIN:VCALENDAR
PRODID:-//Google Inc//Google Calendar 70.9054//EN
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:[email protected]
X-WR-TIMEZONE:Asia/Nicosia
BEGIN:VEVENT
DTSTART:20221016T050000Z
DTEND:20221016T053000Z
DTSTAMP:20221016T144146Z
UID:[email protected]
CREATED:20221016T143242Z
LAST-MODIFIED:20221016T143242Z
LOCATION:
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:test event 8am
TRANSP:OPAQUE
END:VEVENT
BEGIN:VEVENT
DTSTART:20221016T040000Z
DTEND:20221016T043000Z
DTSTAMP:20221016T144146Z
UID:[email protected]
CREATED:20221016T143233Z
LAST-MODIFIED:20221016T143233Z
LOCATION:
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:test event 7am
TRANSP:OPAQUE
END:VEVENT
BEGIN:VEVENT
DTSTART:20221016T030000Z
DTEND:20221016T033000Z
DTSTAMP:20221016T144146Z
UID:[email protected]
CREATED:20221016T143220Z
LAST-MODIFIED:20221016T143220Z
LOCATION:
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:test event 6am
TRANSP:OPAQUE
END:VEVENT
END:VCALENDAR

So the output should be

16/10/2022 08:00 test event 8am
16/10/2022 07:00 test event 7am
16/10/2022 06:00 test event 6am

One thing I noticed is that even though the events were scehduled for 6am, 7am and 8am, the time in the ics is shown as UTC 0. But I'm located at UTC 3, so the DTSTART values are 3 hours back. No idea why.

CodePudding user response:

Using GNU awk

gawk -F: '
    @include "join"
    function timestamp(ts,    m, t, epoch) {
        if (match(ts, /([0-9]{4})([0-9]{2})([0-9]{2})T([0-9]{2})([0-9]{2})([0-9]{2})Z/, m)) {
            t = join(m, 1, 6, " ")
            epoch = mktime(t, 1)
            return strftime("%d/%m/%Y %H:%M", epoch, 0)
        }
        else
            return ts
    }
    /BEGIN:VEVENT/,/END:VEVENT/ {
        if ($1 == "DTSTART") printf "%s ", timestamp($2)
        if ($1 == "SUMMARY") print $2
    }
' file.ics

which, for me in UTC-04:00 time zone, outputs

16/10/2022 01:00 test event 8am
16/10/2022 00:00 test event 7am
15/10/2022 23:00 test event 6am

For a timezone that's UTC 3

TZ=Africa/Djibouti gawk -F: '...' file.ics

we see

16/10/2022 08:00 test event 8am
16/10/2022 07:00 test event 7am
16/10/2022 06:00 test event 6am

Ref: Time Functions in the gawk manual.

CodePudding user response:

I would harness GNU AWK following way, let file.txt content be

BEGIN:VCALENDAR
PRODID:-//Google Inc//Google Calendar 70.9054//EN
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:[email protected]
X-WR-TIMEZONE:Asia/Nicosia
BEGIN:VEVENT
DTSTART:20221016T050000Z
DTEND:20221016T053000Z
DTSTAMP:20221016T144146Z
UID:[email protected]
CREATED:20221016T143242Z
LAST-MODIFIED:20221016T143242Z
LOCATION:
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:test event 8am
TRANSP:OPAQUE
END:VEVENT
BEGIN:VEVENT
DTSTART:20221016T040000Z
DTEND:20221016T043000Z
DTSTAMP:20221016T144146Z
UID:[email protected]
CREATED:20221016T143233Z
LAST-MODIFIED:20221016T143233Z
LOCATION:
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:test event 7am
TRANSP:OPAQUE
END:VEVENT
BEGIN:VEVENT
DTSTART:20221016T030000Z
DTEND:20221016T033000Z
DTSTAMP:20221016T144146Z
UID:[email protected]
CREATED:20221016T143220Z
LAST-MODIFIED:20221016T143220Z
LOCATION:
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:test event 6am
TRANSP:OPAQUE
END:VEVENT
END:VCALENDAR

then

awk 'BEGIN{FS=":"}/^DTSTART/{dtstart=$2}/^SUMMARY/{summary=$2}/^END:VEVENT/{print substr(dtstart,7,2)"/"substr(dtstart,5,2)"/"substr(dtstart,1,4),sprintf("d",substr(dtstart,10,2) 3)":"substr(dtstart,12,2),summary}' file.txt

gives output

16/10/2022 08:00 test event 8am
16/10/2022 07:00 test event 7am
16/10/2022 06:00 test event 6am

Explanation: I inform GNU AWK that row separator (RS) is colon (:). For lines starting with DTSTART I set dtstart variable value to content of 2nd field, for lines starting with SUMMARY I set summary variable value to content of 2nd field. For lines starting with END:VEVENT I do print earlier collected data, I do process datetime mainly using substr function to get desired elements and cocantenate them, in case of hour I do add 3 as stipulated by requirements and then need to format it with leading zero (to get fixed width) for which I do harness sprintf function.

Disclaimer: I assume time offest is etched in stone AND it is acceptable to have hours outside 0...23 range, e.g. you might get say 16/10/2022 25:30 due to mentioned plus three, feel free to improve that if it is not acceptable, however keep in mind that this might lead to situation which lead to increase in day which requires change in month which requires alteration of year (e.g. 31/12/2021 25:30)

(tested in gawk 4.2.1)

  • Related