Home > Blockchain >  Text line to JSON using Awk
Text line to JSON using Awk

Time:05-03

I have lines like this in a file.

Anywhere                   DENY IN     5.255.250.115              # Mozilla/5.0 (compatible; YandexBot/3.0;  http://yandex.com/bots)
Anywhere                   DENY IN     46.229.168.153             # Sep 22 00:03:39 dn sshd[20969]: Bad protocol version identification \003 from port 6493
7
Anywhere                   DENY IN     213.180.203.49             # Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Versi
on/8.0 Mobile/12B411 Safari/600.1.4 (compatible; YandexMobileBot/3.0;  http://yandex.com/bots)

I would like to transform these and only pull out the ip address and the comment at the end of each line.

Here is what i have so far:

grep 'DENY IN' tmp.txt | awk '{printf "{\"ip\":\"%s\",\"reason\":\"%s\"},", $4, substr($0,index($0,$5))}' | sed 's/,$//g' | awk '{ printf "[%s]", $0}'
[{"ip":"5.255.250.115","reason":"# Mozilla/5.0 (compatible; YandexBot/3.0;  http://yandex.com/bots)"},{"ip":"46.229.168.153","reason":"# Sep 22 00:03:39 dn sshd[20969]: Bad protocol version identification \003 from port 64347"},{"ip":"213.180.203.49","reason":"# Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Versi"}]

The issue is that the comment string is not a valid json value.

When i feed that to python3 -m json.tool, i get this error:

grep 'DENY IN' tmp.txt | awk '{printf "{\"ip\":\"%s\",\"reason\":\"%s\"},", $4, substr($0,index($0,$5))}' | sed 's/,$//g' | awk '{ printf "[%s]", $0}' | python3 -m json.tool
Invalid \escape: line 1 column 218 (char 217)

Is there any way to print the json value as an escaped literal string in the json output using Awk or any other command line tool?

Thanks

CodePudding user response:

Are you looking for something like this? As jq is tagged, this reads each line as raw string using -R, splits the line by at least two consecutive spaces, and generates the object from the last two columns:

jq -nR '[inputs | [splits("\\s{2,}")] | {ip:.[2], reason:.[3]}]'
[
  {
    "ip": "5.255.250.115",
    "reason": "# Mozilla/5.0 (compatible; YandexBot/3.0;  http://yandex.com/bots)"
  },
  {
    "ip": "46.229.168.153",
    "reason": "# Sep 22 00:03:39 dn sshd[20969]: Bad protocol version identification \\003 from port 64937"
  },
  {
    "ip": "213.180.203.49",
    "reason": "# Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B411 Safari/600.1.4 (compatible; YandexMobileBot/3.0;  http://yandex.com/bots)"
  }
]

Demo

CodePudding user response:

An awk-only solution - not as elegant as i hoped :

echo "${bbbbb}" \
\
| mawk '
  BEGIN { printf("%s","[")

     OFS = "\42, \42reason\42: \42"
      FS = "[ \\t][ \\t] ([\43][ ])?"

               _="[A]?[^D] DENY[ ]IN[^0-9] "
      RS = "^"(_)"|[\\r]?[\\n]("(_)")?"
  
  }                   !NF { next 
  } !/(port [0-9] |[)])$/ { 
  
       $_=sprintf("%s\\n%*s",$_,(getline)^ RS,$_) 
  } { 
       gsub(/[\3]/,"\\003")
       gsub(/\\/,   "\\&")

       printf("%.*s{\42ip\42: \42%s\42 }",
                   _<__  , ",", $!(NF=NF)) 
  
  } END { printf("]\n") }' | jq

[
  {
    "ip": "5.255.250.115",
    "reason": "Mozilla/5.0 (compatible; YandexBot/3.0;  http://yandex.com/bots)"
  },
  {
    "ip": "46.229.168.153",
    "reason": "Sep 22 00:03:39 dn sshd[20969]: Bad protocol version identification &003 from port 6493"
  },
  {
    "ip": "213.180.203.49",
    "reason": "Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Versi&non/8.0 Mobile/12B411 Safari/600.1.4 (compatible; YandexMobileBot/3.0;  http://yandex.com/bots)"
  }
]
  • Related