Home > Software engineering >  How to filter jCal with jq?
How to filter jCal with jq?

Time:11-10

I have an jCal JSON array which I'd like to filter with jq. JSON arrays are somewhat new to me and I have been banging my head to the wall on this for hours...

The file looks like this:

[
  "vcalendar",
  [
    [
      "calscale",
      {},
      "text",
      "GREGORIAN"
    ],
    [
      "version",
      {},
      "text",
      "2.0"
    ],
    [
      "prodid",
      {},
      "text",
      "-//SabreDAV//SabreDAV//EN"
    ],
    [
      "x-wr-calname",
      {},
      "unknown",
      "Call log private"
    ],
    [
      "x-apple-calendar-color",
      {},
      "unknown",
      "#ffaa00"
    ],
    [
      "refresh-interval",
      {},
      "duration",
      "PT4H"
    ],
    [
      "x-published-ttl",
      {},
      "unknown",
      "PT4H"
    ]
  ],
  [
    [
      "vevent",
      [
        [
          "dtstamp",
          {},
          "date-time",
          "2015-04-05T16:42:10Z"
        ],
        [
          "created",
          {},
          "date-time",
          "2015-02-18T16:44:04Z"
        ],
        [
          "uid",
          {},
          "text",
          "9b23142b-8d86-3e17-2f44-2bed65b2e471"
        ],
        [
          "last-modified",
          {},
          "date-time",
          "2015-04-05T16:42:10Z"
        ],
        [
          "description",
          {},
          "text",
          "Phone call to  49xxxxxxxxxx lasted for 0 seconds."
        ],
        [
          "summary",
          {},
          "text",
          "Outgoing:  49xxxxxxx"
        ],
        [
          "dtstart",
          {},
          "date-time",
          "2015-02-18T10:58:12Z"
        ],
        [
          "dtend",
          {},
          "date-time",
          "2015-02-18T10:58:44Z"
        ],
        [
          "transp",
          {},
          "text",
          "OPAQUE"
        ]
      ],
      []
    ],
    [
      "vevent",
      [
        [
          "dtstamp",
          {},
          "date-time",
          "2015-04-05T16:42:10Z"
        ],
        [
          "created",
          {},
          "date-time",
          "2015-01-09T19:12:05Z"
        ],
        [
          "uid",
          {},
          "text",
          "c337e092-a012-5f5a-497f-932fbc6159e5"
        ],
        [
          "last-modified",
          {},
          "date-time",
          "2015-04-05T16:42:10Z"
        ],
        [
          "description",
          {},
          "text",
          "Phone call to  1xxxxxxxxxx lasted for 39 seconds."
        ],
        [
          "summary",
          {},
          "text",
          "Outgoing:  1xxxxxxxxxx"
        ],
        [
          "dtstart",
          {},
          "date-time",
          "2015-01-09T17:23:16Z"
        ],
        [
          "dtend",
          {},
          "date-time",
          "2015-01-09T17:24:19Z"
        ],
        [
          "transp",
          {},
          "text",
          "OPAQUE"
        ]
      ],
      []
    ],
  ]
]

I would like to filter out dtstart, dtend, the target phone number and the connection duration from the description for each vevent which was created e.g. in January 2019 ("2019-01.*") and output them as a CSV.

CodePudding user response:

This JSON is a bit strange because the information is stored position-based in an array instead of an object. Using the first element of an array ("vevent") to identify its contents is not the best practice.

But anyway ... if this is the data source you are dealing with, this code should help you.

jq -r '..
       | arrays
       | select(.[0] == "vevent")[1]
       | [
           (.[] | select(.[0] == "dtstart") | .[3]),
           (.[] | select(.[0] == "dtend") | .[3]),
           (.[] | select(.[0] == "description") | .[3])
         ]
       | @csv
      '

Alternatively, the repeating code can be transferred into a function

jq -r 'def getField($name; $idx): .[] | select(.[0] == $name) | .[$idx];
       ..
       | arrays
       | select(.[0] == "vevent")[1]
       | [ getField("dtstart"; 3), getField("dtend"; 3), getField("description"; 3) ]
       | @csv
      '

Output

"2015-02-18T10:58:12Z","2015-02-18T10:58:44Z","Phone call to  49xxxxxxxxxx lasted for 0 seconds."
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z","Phone call to  1xxxxxxxxxx lasted for 39 seconds."

You can also extract phone number and duration with the help of regular expressions in jq:

jq -r 'def getField($name; $idx): .[] | select(.[0] == $name) | .[$idx];
       ..
       | arrays
       | select(.[0] == "vevent")[1]
       | [
           getField("dtstart"; 3),
           getField("dtend"; 3),
           (getField("description"; 3) | match("call to ([^ ]*)") | .captures[0].string),
           (getField("description"; 3) | match("(\\d ) seconds")  | .captures[0].string)
         ]
       | @csv
      '

Output

"2015-02-18T10:58:12Z","2015-02-18T10:58:44Z"," 49xxxxxxxxxx","0"
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z"," 1xxxxxxxxxx","39"

CodePudding user response:

Not the most efficient solution, but quite understandable by first building an object out of key-value pairs and then filtering and transforming those.

.[2][][1] is a stream of events encoded as arrays.

Which means that:

.[2][][1]
  | map({key:.[0], value:.[3]})
  | from_entries

the above gives you a stream of objects; one object per event:

{
  "dtstamp": "2015-04-05T16:42:10Z",
  "created": "2015-02-18T16:44:04Z",
  "uid": "9b23142b-8d86-3e17-2f44-2bed65b2e471",
  "last-modified": "2015-04-05T16:42:10Z",
  "description": "Phone call to  49xxxxxxxxxx lasted for 0 seconds.",
  "summary": "Outgoing:  49xxxxxxx",
  "dtstart": "2015-02-18T10:58:12Z",
  "dtend": "2015-02-18T10:58:44Z",
  "transp": "OPAQUE"
}
{
  "dtstamp": "2015-04-05T16:42:10Z",
  "created": "2015-01-09T19:12:05Z",
  "uid": "c337e092-a012-5f5a-497f-932fbc6159e5",
  "last-modified": "2015-04-05T16:42:10Z",
  "description": "Phone call to  1xxxxxxxxxx lasted for 39 seconds.",
  "summary": "Outgoing:  1xxxxxxxxxx",
  "dtstart": "2015-01-09T17:23:16Z",
  "dtend": "2015-01-09T17:24:19Z",
  "transp": "OPAQUE"
}

Now plug that into the final program: select the wanted objects, add CSV headers, build the rows and ultimately convert to CSV:

["start", "end", "description"],
(
  .[2][][1]
  | map({key:.[0], value:.[3]})
  | from_entries
  | select(.created | startswith("2015-01"))
  | [.dtstart, .dtend, .description]
)
| @csv

Raw output (-r):

"start","end","description"
"2015-01-09T17:23:16Z","2015-01-09T17:24:19Z","Phone call to  1xxxxxxxxxx lasted for 39 seconds."

If you need to further transform .description, you can use split or capture. Or use a different property, such as .summary, in your CSV rows. Only a single line needs to be changed.

  • Related