I have a php script that parses a telegram history message. It is comma seperated, but grouped by parenthesis. I have tried a lot of things. Here is my current code that does not parse correctly.
$message = "Message(id=1650, peer_id=PeerChannel(channel_id=1286966173), date=datetime.datetime(2022, 4, 15, 17, 14, 25, tzinfo=datetime.timezone.utc), message='Please check your email inbox', out=False, mentioned=False, media_unread=False, silent=False, post=True, from_scheduled=False, legacy=False, edit_hide=False, pinned=False, from_id=None, fwd_from=None, via_bot_id=None, reply_to=None, media=None, reply_markup=None, entities=[], views=382, forwards=0, replies=None, edit_date=None, post_author=None, grouped_id=None, restriction_reason=[], ttl_period=None)04-18-2022 01:25am";
$keywords = preg_split("/,(?![^(] \))/", $message);
foreach($keywords as $value){
echo "$value<hr>";
}
I am looking to parse the date and message.
CodePudding user response:
You can chain a couple of regexes together and do some data transformation to get the result as a PHP array.
This output looks like it has been printed with Python, instead of using a script to parse the python dump, I'd recommend modifying the python script to output something easier to parse.
$raw_message = "Message(id=1650, peer_id=PeerChannel(channel_id=1286966173), date=datetime.datetime(2022, 4, 15, 17, 14, 25, tzinfo=datetime.timezone.utc), message='Please check your email inbox', out=False, mentioned=False, media_unread=False, silent=False, post=True, from_scheduled=False, legacy=False, edit_hide=False, pinned=False, from_id=None, fwd_from=None, via_bot_id=None, reply_to=None, media=None, reply_markup=None, entities=[], views=382, forwards=0, replies=None, edit_date=None, post_author=None, grouped_id=None, restriction_reason=[], ttl_period=None)04-18-2022 01:25am";
$strip_message = '';
preg_match("/Message\((.*)\)/", $raw_message, $strip_message);
$split_by_comma = array_map('trim', preg_split("/,(?![^()]*\))/", $strip_message[1]));
$message=[];
foreach ($split_by_comma as $element) {
$split_by_equals = explode('=', $element);
$key = array_shift($split_by_equals);
$value = implode('=', $split_by_equals);
if ($key !== 'message' && $key !== 'date') {
$message[$key] = $value;
continue;
}
if ($key === 'date') {
$date_tmp = '';
preg_match("/\((.*)\)/", $value, $date_tmp);
$date_split = explode(', ', $date_tmp[1]);
$date = $date_split[0] . '-' . $date_split[1] . '-' . $date_split[2] . ' ' . $date_split[3] . ':' . $date_split[4] . ':' . $date_split[5];
$message[$key] = $date;
}
if ($key === 'message') {
$message[$key] = str_replace("'", '', $value);
}
}
var_dump($message);
Results in:
array(28) {
["id"]=>
string(4) "1650"
["peer_id"]=>
string(34) "PeerChannel(channel_id=1286966173)"
["date"]=>
string(18) "2022-4-15 17:14:25"
["message"]=>
string(29) "Please check your email inbox"
["out"]=>
string(5) "False"
["mentioned"]=>
string(5) "False"
["media_unread"]=>
string(5) "False"
["silent"]=>
string(5) "False"
["post"]=>
string(4) "True"
["from_scheduled"]=>
string(5) "False"
["legacy"]=>
string(5) "False"
["edit_hide"]=>
string(5) "False"
["pinned"]=>
string(5) "False"
["from_id"]=>
string(4) "None"
["fwd_from"]=>
string(4) "None"
["via_bot_id"]=>
string(4) "None"
["reply_to"]=>
string(4) "None"
["media"]=>
string(4) "None"
["reply_markup"]=>
string(4) "None"
["entities"]=>
string(2) "[]"
["views"]=>
string(3) "382"
["forwards"]=>
string(1) "0"
["replies"]=>
string(4) "None"
["edit_date"]=>
string(4) "None"
["post_author"]=>
string(4) "None"
["grouped_id"]=>
string(4) "None"
["restriction_reason"]=>
string(2) "[]"
["ttl_period"]=>
string(4) "None"
}