I'm trying to parse a log fortinet in PHP. I taked a log example from the Fortinet's cookbook.
This is my code with the regex. I want to create an array that has the type of value as index and than his value. For example: [date]=>2019-05-10 [time]=>11:50:48 ... [srcip]=>172.16.200.254
$regex = '/[a-zA-Z] =[0-9]{4}-[0-9]{2}-[0-9]{2} [a-zA-Z] =[0-9]{2}:[0-9]{2}:[0-9]{2}(\\.[0-9]{1,3})? [a-zA-Z] ="[^"]*" [a-zA-Z] ="[a-zA-Z] " [a-zA-Z] ="[^"]*" [a-zA-Z] ="[^"]*" [a-zA-Z] ="[^"]*" [a-zA-Z] =[0-9] [a-zA-Z] =\\b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\\b [a-zA-Z] =[0-9] [a-zA-Z] ="[^"]*" [a-zA-Z] ="[^"]*" [a-zA-Z] =\\b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\\b [a-zA-Z] =[0-9] [a-zA-Z] ="[^"]*" [a-zA-Z] ="[^"]*" [a-zA-Z] =[0-9] [a-zA-Z] =[0-9] [a-zA-Z] ="[^"]*" [a-zA-Z] =[0-9] [a-zA-Z] ="[^"]*" [a-zA-Z] ="[^"]*" [a-zA-Z] ="[^"]*" [a-zA-Z] ="[^"]*" [a-zA-Z] ="[^"]*" [a-zA-Z] ="[^"]*" [a-zA-Z] =[0-9] [a-zA-Z] =[0-9] [a-zA-Z] =[0-9] [a-zA-Z] =[0-9] [a-zA-Z] =[0-9] [a-zA-Z] ="[^"]*"/i';
$str = 'date=2019-05-10 time=11:50:48 logid="0001000014" type="traffic" subtype="local" level="notice" vd="vdom1" eventtime=1557514248379911176 srcip=172.16.200.254 srcport=62024 srcintf="port11" srcintfrole="undefined" dstip=172.16.200.2 dstport=443 dstintf="vdom1" dstintfrole="undefined" sessionid=107478 proto=6 action="server-rst" policyid=0 policytype="local-in-policy" service="HTTPS" dstcountry="Reserved" srccountry="Reserved" trandisp="noop" app="Web Management(HTTPS)" duration=5 sentbyte=1247 rcvdbyte=1719 sentpkt=5 rcvdpkt=6 appcat="unscanned"';
preg_match_all($regex, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
CodePudding user response:
Perhaps using a small pattern with a branch reset group will be sufficient, where group 1 contains the key and group 2 contains the value:
([^\s=] )=(?|"([^"]*)"|(\S ))
Example
$regex = '/([^\s=] )=(?|"([^"]*)"|(\S ))/';
$str = 'date=2019-05-10 time=11:50:48 logid="0001000014" type="traffic" subtype="local" level="notice" vd="vdom1" eventtime=1557514248379911176 srcip=172.16.200.254 srcport=62024 srcintf="port11" srcintfrole="undefined" dstip=172.16.200.2 dstport=443 dstintf="vdom1" dstintfrole="undefined" sessionid=107478 proto=6 action="server-rst" policyid=0 policytype="local-in-policy" service="HTTPS" dstcountry="Reserved" srccountry="Reserved" trandisp="noop" app="Web Management(HTTPS)" duration=5 sentbyte=1247 rcvdbyte=1719 sentpkt=5 rcvdpkt=6 appcat="unscanned"';
preg_match_all($regex, $str, $matches, PREG_SET_ORDER, 0);
$result = array_reduce($matches, function($carry, $item) {
$carry[$item[1]] = $item[2];
return $carry;
}, []);
print_r($result);
Output
Array
(
[date] => 2019-05-10
[time] => 11:50:48
[logid] => 0001000014
[type] => traffic
[subtype] => local
[level] => notice
[vd] => vdom1
[eventtime] => 1557514248379911176
[srcip] => 172.16.200.254
[srcport] => 62024
[srcintf] => port11
[srcintfrole] => undefined
[dstip] => 172.16.200.2
[dstport] => 443
[dstintf] => vdom1
[dstintfrole] => undefined
[sessionid] => 107478
[proto] => 6
[action] => server-rst
[policyid] => 0
[policytype] => local-in-policy
[service] => HTTPS
[dstcountry] => Reserved
[srccountry] => Reserved
[trandisp] => noop
[app] => Web Management(HTTPS)
[duration] => 5
[sentbyte] => 1247
[rcvdbyte] => 1719
[sentpkt] => 5
[rcvdpkt] => 6
[appcat] => unscanned
)
CodePudding user response:
It sounds absurd, but your log is like HTML attributes, creating a html and parsing the attributes works fine.
<?php
$str = '
date=2019-05-10 time=11:50:48 logid="0001000014" type="traffic" subtype="local" level="notice" vd="vdom1" eventtime=1557514248379911176 srcip=172.16.200.254 srcport=62024 srcintf="port11" srcintfrole="undefined" dstip=172.16.200.2 dstport=443 dstintf="vdom1" dstintfrole="undefined" sessionid=107478 proto=6 action="server-rst" policyid=0 policytype="local-in-policy" service="HTTPS" dstcountry="Reserved" srccountry="Reserved" trandisp="noop" app="Web Management(HTTPS)" duration=5 sentbyte=1247 rcvdbyte=1719 sentpkt=5 rcvdpkt=6 appcat="unscanned"
date=2020-05-10 time=11:50:48 logid="0001000015" type="traffic2" subtype="local2" level="notice2" vd="vdom12" eventtime=15575142483799111762 srcip=172.16.200.2542 srcport=620242 srcintf="port112" srcintfrole="undefined2" dstip=172.16.200.22 dstport=4432 dstintf="vdom12" dstintfrole="undefined2" sessionid=1074782 proto=62 action="server-rst2" policyid=02 policytype="local-in-policy2" service="HTTPS2" dstcountry="Reserved2" srccountry="Reserved2" trandisp="noop2" app="Web Management(HTTPS)2" duration=52 sentbyte=12472 rcvdbyte=17192 sentpkt=52 rcvdpkt=62 appcat="unscanned2"
';
$lines = preg_split("/\n/", $str);
$lines = array_filter($lines);
$html = "<div>\n";
foreach($lines as $line)
$html.= "\t<tag {$line}></tag>\n";
$html.= "</div>\n";
$html = load_html($html);
$xpath = new DOMXpath($html);
$tags = $xpath->query("//tag");
$result = [];
$i = 0;
foreach($tags as $tag)
{
if ($tag->hasAttributes())
{
foreach ($tag->attributes as $attr)
{
$name = $attr->nodeName;
$value = $attr->nodeValue;
$result[$i][$name] = $value;
}
$i ;
}
}
print_r($result);
function load_html($str)
{
//html-a DOM-an kargatu
$dom = new DOMDocument('1.0', 'utf-8');
$dom->preserveWhiteSpace = false;
//@$dom->loadHTML("<?xml encoding=\"UTF-8\">".utf8_decode($str));
@$dom->loadHTML("<?xml encoding=\"UTF-8\">".$str);
$dom->formatOutput = true;
// dirty fix
foreach ($dom->childNodes as $item)
{
if ($item->nodeType == XML_PI_NODE)
$dom->removeChild($item); // remove hack
}
return $dom;
}
Output:
Array
(
[0] => Array
(
[date] => 2019-05-10
[time] => 11:50:48
[logid] => 0001000014
[type] => traffic
[subtype] => local
[level] => notice
[vd] => vdom1
[eventtime] => 1557514248379911176
[srcip] => 172.16.200.254
[srcport] => 62024
[srcintf] => port11
[srcintfrole] => undefined
[dstip] => 172.16.200.2
[dstport] => 443
[dstintf] => vdom1
[dstintfrole] => undefined
[sessionid] => 107478
[proto] => 6
[action] => server-rst
[policyid] => 0
[policytype] => local-in-policy
[service] => HTTPS
[dstcountry] => Reserved
[srccountry] => Reserved
[trandisp] => noop
[app] => Web Management(HTTPS)
[duration] => 5
[sentbyte] => 1247
[rcvdbyte] => 1719
[sentpkt] => 5
[rcvdpkt] => 6
[appcat] => unscanned
)
[1] => Array
(
[date] => 2020-05-10
[time] => 11:50:48
[logid] => 0001000015
[type] => traffic2
[subtype] => local2
[level] => notice2
[vd] => vdom12
[eventtime] => 15575142483799111762
[srcip] => 172.16.200.2542
[srcport] => 620242
[srcintf] => port112
[srcintfrole] => undefined2
[dstip] => 172.16.200.22
[dstport] => 4432
[dstintf] => vdom12
[dstintfrole] => undefined2
[sessionid] => 1074782
[proto] => 62
[action] => server-rst2
[policyid] => 02
[policytype] => local-in-policy2
[service] => HTTPS2
[dstcountry] => Reserved2
[srccountry] => Reserved2
[trandisp] => noop2
[app] => Web Management(HTTPS)2
[duration] => 52
[sentbyte] => 12472
[rcvdbyte] => 17192
[sentpkt] => 52
[rcvdpkt] => 62
[appcat] => unscanned2
)
)