Home > Enterprise >  Separate JSON and non-JSON logs with jq?
Separate JSON and non-JSON logs with jq?

Time:10-27

I have some log files which contain mixed of JSON and non-JSON logs, I'd like to separate them into two files, one contains JSON logs only and the other contains non-JSON logs, I get some ideas from this to extract JSON logs with jq, here are what I have tried using tee to split log into two files (usage from here & here) and jq to extract logs:

cat $logfile | tee  >(jq -R -c 'fromjson? | select(type == "object") | not') > $plain_log_file) >(jq -R -c 'fromjson? | select(type == "object")' > $json_log_file)

This extracts JSON logs correctly but returns false for each non-JSON log instead of the log content itself.

cat $logfile | tee  >(jq -R -c 'try fromjson catch .') > $plain_log_file) >(jq -R -c 'fromjson? | select(type == "object")' > $json_log_file)

this gets jq syntax error "catch ."

Any suggestion on how to achieve this? Appreciate your help!

sample input:

{ "name": "joe"}
text line, this can be multi-line too
{ "xyz": 123 }

CodePudding user response:

Assuming each JSON log item occurs on a separate line:

For the JSON logs:

jq -nR -c 'inputs|fromjson?'

For the others, you could use:

jq -nRr  'inputs | . as $in | try (fromjson|empty) catch $in'

CodePudding user response:

If you only want to linewise separate the input into different files, go with @peak's solution. But if you want to further process the lines on conditions, you could turn them into an array using -Rn and [inputs], and go from there. For instance, if you need the according line numbers (e.g. to feed them into another tool, e.g. sed), use from_entries which for arrays provides them in the .key field:

jq -Rn 'reduce ([inputs] | to_entries[]) as $in ({};
  .[($in.value | fromjson? | "json") // "plain"]  = [$in.key]
)'
{
  "json": [
    0,
    2
  ],
  "plain": [
    1
  ]
}

Demo

  • Related