Home > Enterprise >  Pass JSON field contained in CSV file using JQ
Pass JSON field contained in CSV file using JQ

Time:04-09

I have a log file from our friends at Microsoft in a very challenging format. The file is as follows:

  • File is a .CSV
  • Four fields, the fourth contains JSON
  • All JSON key pairs are wrapped with two sets of double quotes

I have an export of several of these files, and I want to quickly parse them in Terminal using GREP to find key events.

Sanitized Example:

CreationDate,UserIds,Operations,AuditData
2022-01-01T15:00:00.0000000Z,[email protected],FileViewed,"{""AppAccessContext"":{""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65""},""CreationTime"":""2022-01-01T15:00:00"",""Id"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Operation"":""FileViewed"",""OrganizationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""RecordType"":0,""UserType"":0,""Version"":0,""Workload"":""OneDrive"",""ClientIP"":""172.0.0.1"",""ObjectId"":""https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt"",""UserId"":""[email protected]"",""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""EventSource"":""SharePoint"",""ItemType"":""File"",""ListId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""ListItemUniqueId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Site"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""WebId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""SourceFileName"":""TextFile.txt"",""SourceRelativeUrl"":""Documents""}"
2022-01-01T15:01:15.0000000Z,[email protected],FileViewed,"{""AppAccessContext"":{""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65""},""CreationTime"":""2022-01-01T15:01:15"",""Id"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Operation"":""FileViewed"",""OrganizationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""RecordType"":0,""UserType"":0,""Version"":0,""Workload"":""OneDrive"",""ClientIP"":""172.0.0.1"",""ObjectId"":""https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt"",""UserId"":""[email protected]"",""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""EventSource"":""SharePoint"",""ItemType"":""File"",""ListId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""ListItemUniqueId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Site"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""WebId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""SourceFileName"":""TextFile.txt"",""SourceRelativeUrl"":""Documents""}"
2022-01-01T15:02:02.0000000Z,[email protected],FileViewed,"{""AppAccessContext"":{""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65""},""CreationTime"":""2022-01-01T15:02:02"",""Id"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Operation"":""FileViewed"",""OrganizationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""RecordType"":0,""UserType"":0,""Version"":0,""Workload"":""OneDrive"",""ClientIP"":""172.0.0.1"",""ObjectId"":""https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt"",""UserId"":""[email protected]"",""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""EventSource"":""SharePoint"",""ItemType"":""File"",""ListId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""ListItemUniqueId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Site"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""WebId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""SourceFileName"":""TextFile.txt"",""SourceRelativeUrl"":""Documents""}"

I am attempting to parse the files with a combination of the cut and jq in terminal, but I am struggling as the cut command does not work well with a comma delimiter when the JSON field is riddled with them. I would change the file to a tab-delimited file, ideally I want to avoid that if possible as I'd like to quickly inspect the logs for key events on the fly and not have to open each one up and convert the format.

Where I am at:

grep FileViewed AnnoyingLogFile.csv | cut -d, -f 4 | jq .

Output:

"{"
"AppAccessContext"
":{"
"CorrelationId"
":"
"f6298547-d934-4c79-8bab-c5c394f31f65"

Output I want:

{
    "AppAccessContext":
    {
        "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65"
    },
    "CreationTime": "2022-01-01T15:00:00",
    "Id": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "Operation": "FileViewed",
    "OrganizationId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "RecordType": 0,
    "UserType": 0,
    "Version": 0,
    "Workload": "OneDrive",
    "ClientIP": "172.0.0.1",
    "ObjectId": "https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt",
    "UserId": "[email protected]",
    "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "EventSource": "SharePoint",
    "ItemType": "File",
    "ListId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "ListItemUniqueId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "Site": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "WebId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "SourceFileName": "TextFile.txt",
    "SourceRelativeUrl": "Documents"
}
...

I have already used an alternative to analyze these logs, but I wanted to throw this question here to see if it is achievable to parse in terminal using cut, jq, or any other commands.

CodePudding user response:

You might want to try Miller, which is available here as a stand-alone executable for various OSs.

With Miller, the parsing and conversion of your CSV containing JSON fields becomes a breeze:

mlr --icsv --ojson json-parse AnnoyingLogFile.csv
[
{
  "CreationDate": "2022-01-01T15:00:00.0000000Z",
  "UserIds": "[email protected]",
  "Operations": "FileViewed",
  "AuditData": {
    "AppAccessContext": {
      "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65"
    },
    "CreationTime": "2022-01-01T15:00:00",
    "Id": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "Operation": "FileViewed",
    "OrganizationId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "RecordType": 0,
    "UserType": 0,
    "Version": 0,
    "Workload": "OneDrive",
    "ClientIP": "172.0.0.1",
    "ObjectId": "https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt",
    "UserId": "[email protected]",
    "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "EventSource": "SharePoint",
    "ItemType": "File",
    "ListId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "ListItemUniqueId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "Site": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "WebId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "SourceFileName": "TextFile.txt",
    "SourceRelativeUrl": "Documents"
  }
}, ...

And for outputting a stack of JSON objects equivalent to your expected output:

mlr --icsv --ojsonl json-parse then filter 'emit1 $AuditData; false;' AnnoyingLogFile.csv
{"AppAccessContext": {"CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65"}, "CreationTime": "2022-01-01T15:00:00", "Id": "f6298547-d934-4c79-8bab-c5c394f31f65", "Operation": "FileViewed", "OrganizationId": "f6298547-d934-4c79-8bab-c5c394f31f65", "RecordType": 0, "UserType": 0, "Version": 0, "Workload": "OneDrive", "ClientIP": "172.0.0.1", "ObjectId": "https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt", "UserId": "[email protected]", "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65", "EventSource": "SharePoint", "ItemType": "File", "ListId": "f6298547-d934-4c79-8bab-c5c394f31f65", "ListItemUniqueId": "f6298547-d934-4c79-8bab-c5c394f31f65", "Site": "f6298547-d934-4c79-8bab-c5c394f31f65", "WebId": "f6298547-d934-4c79-8bab-c5c394f31f65", "SourceFileName": "TextFile.txt", "SourceRelativeUrl": "Documents"}
{"AppAccessContext": {"CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65"}, "CreationTime": "2022-01-01T15:01:15", "Id": "f6298547-d934-4c79-8bab-c5c394f31f65", "Operation": "FileViewed", "OrganizationId": "f6298547-d934-4c79-8bab-c5c394f31f65", "RecordType": 0, "UserType": 0, "Version": 0, "Workload": "OneDrive", "ClientIP": "172.0.0.1", "ObjectId": "https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt", "UserId": "[email protected]", "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65", "EventSource": "SharePoint", "ItemType": "File", "ListId": "f6298547-d934-4c79-8bab-c5c394f31f65", "ListItemUniqueId": "f6298547-d934-4c79-8bab-c5c394f31f65", "Site": "f6298547-d934-4c79-8bab-c5c394f31f65", "WebId": "f6298547-d934-4c79-8bab-c5c394f31f65", "SourceFileName": "TextFile.txt", "SourceRelativeUrl": "Documents"}
{"AppAccessContext": {"CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65"}, "CreationTime": "2022-01-01T15:02:02", "Id": "f6298547-d934-4c79-8bab-c5c394f31f65", "Operation": "FileViewed", "OrganizationId": "f6298547-d934-4c79-8bab-c5c394f31f65", "RecordType": 0, "UserType": 0, "Version": 0, "Workload": "OneDrive", "ClientIP": "172.0.0.1", "ObjectId": "https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt", "UserId": "[email protected]", "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65", "EventSource": "SharePoint", "ItemType": "File", "ListId": "f6298547-d934-4c79-8bab-c5c394f31f65", "ListItemUniqueId": "f6298547-d934-4c79-8bab-c5c394f31f65", "Site": "f6298547-d934-4c79-8bab-c5c394f31f65", "WebId": "f6298547-d934-4c79-8bab-c5c394f31f65", "SourceFileName": "TextFile.txt", "SourceRelativeUrl": "Documents"}

CodePudding user response:

Probably not the best, but working

grep FileViewed AnnoyingLogFile.csv | cut -d, -f 4- | sed -e 's/""/"/g' -e 's/^"//' -e 's/"$//' | jq .

First sed replace "" with ", the second remove " at the beginning and third at the end

If json is not the last column, you can use rev and cut it from the end and rev it back

  • Related