I have lines like this in a file.
Anywhere DENY IN 5.255.250.115 # Mozilla/5.0 (compatible; YandexBot/3.0; http://yandex.com/bots)
Anywhere DENY IN 46.229.168.153 # Sep 22 00:03:39 dn sshd[20969]: Bad protocol version identification \003 from port 6493
7
Anywhere DENY IN 213.180.203.49 # Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Versi
on/8.0 Mobile/12B411 Safari/600.1.4 (compatible; YandexMobileBot/3.0; http://yandex.com/bots)
I would like to transform these and only pull out the ip address and the comment at the end of each line.
Here is what i have so far:
grep 'DENY IN' tmp.txt | awk '{printf "{\"ip\":\"%s\",\"reason\":\"%s\"},", $4, substr($0,index($0,$5))}' | sed 's/,$//g' | awk '{ printf "[%s]", $0}'
[{"ip":"5.255.250.115","reason":"# Mozilla/5.0 (compatible; YandexBot/3.0; http://yandex.com/bots)"},{"ip":"46.229.168.153","reason":"# Sep 22 00:03:39 dn sshd[20969]: Bad protocol version identification \003 from port 64347"},{"ip":"213.180.203.49","reason":"# Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Versi"}]
The issue is that the comment string is not a valid json value.
When i feed that to python3 -m json.tool, i get this error:
grep 'DENY IN' tmp.txt | awk '{printf "{\"ip\":\"%s\",\"reason\":\"%s\"},", $4, substr($0,index($0,$5))}' | sed 's/,$//g' | awk '{ printf "[%s]", $0}' | python3 -m json.tool
Invalid \escape: line 1 column 218 (char 217)
Is there any way to print the json value as an escaped literal string in the json output using Awk or any other command line tool?
Thanks
CodePudding user response:
Are you looking for something like this? As jq
is tagged, this reads each line as raw string using -R
, splits
the line by at least two consecutive spaces, and generates the object from the last two columns:
jq -nR '[inputs | [splits("\\s{2,}")] | {ip:.[2], reason:.[3]}]'
[
{
"ip": "5.255.250.115",
"reason": "# Mozilla/5.0 (compatible; YandexBot/3.0; http://yandex.com/bots)"
},
{
"ip": "46.229.168.153",
"reason": "# Sep 22 00:03:39 dn sshd[20969]: Bad protocol version identification \\003 from port 64937"
},
{
"ip": "213.180.203.49",
"reason": "# Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B411 Safari/600.1.4 (compatible; YandexMobileBot/3.0; http://yandex.com/bots)"
}
]
CodePudding user response:
An awk
-only solution - not as elegant as i hoped :
echo "${bbbbb}" \
\
| mawk '
BEGIN { printf("%s","[")
OFS = "\42, \42reason\42: \42"
FS = "[ \\t][ \\t] ([\43][ ])?"
_="[A]?[^D] DENY[ ]IN[^0-9] "
RS = "^"(_)"|[\\r]?[\\n]("(_)")?"
} !NF { next
} !/(port [0-9] |[)])$/ {
$_=sprintf("%s\\n%*s",$_,(getline)^ RS,$_)
} {
gsub(/[\3]/,"\\003")
gsub(/\\/, "\\&")
printf("%.*s{\42ip\42: \42%s\42 }",
_<__ , ",", $!(NF=NF))
} END { printf("]\n") }' | jq
[
{
"ip": "5.255.250.115",
"reason": "Mozilla/5.0 (compatible; YandexBot/3.0; http://yandex.com/bots)"
},
{
"ip": "46.229.168.153",
"reason": "Sep 22 00:03:39 dn sshd[20969]: Bad protocol version identification &003 from port 6493"
},
{
"ip": "213.180.203.49",
"reason": "Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Versi&non/8.0 Mobile/12B411 Safari/600.1.4 (compatible; YandexMobileBot/3.0; http://yandex.com/bots)"
}
]