Home > Mobile >  How to parse Telegram message by comma, but keep commas that are in parenthesis
How to parse Telegram message by comma, but keep commas that are in parenthesis


I have a PHP script that parses a telegram history message. It is comma separated, but grouped by parenthesis. I have tried a lot of things. Here is my current code that does not parse correctly.

$message = "Message(id=1650, peer_id=PeerChannel(channel_id=1286966173), date=datetime.datetime(2022, 4, 15, 17, 14, 25, tzinfo=datetime.timezone.utc), message='Please check your email inbox', out=False, mentioned=False, media_unread=False, silent=False, post=True, from_scheduled=False, legacy=False, edit_hide=False, pinned=False, from_id=None, fwd_from=None, via_bot_id=None, reply_to=None, media=None, reply_markup=None, entities=[], views=382, forwards=0, replies=None, edit_date=None, post_author=None, grouped_id=None, restriction_reason=[], ttl_period=None)04-18-2022 01:25am";
$keywords = preg_split("/,(?![^(] \))/", $message);
foreach($keywords as $value){
    echo "$value<hr>";

I am looking to parse the date and message.

CodePudding user response:

You can chain a couple of regexes together and do some data transformation to get the result as a PHP array.

This output looks like it has been printed with Python, instead of using a script to parse the python dump, I'd recommend modifying the python script to output something easier to parse.

$raw_message = "Message(id=1650, peer_id=PeerChannel(channel_id=1286966173), date=datetime.datetime(2022, 4, 15, 17, 14, 25, tzinfo=datetime.timezone.utc), message='Please check your email inbox', out=False, mentioned=False, media_unread=False, silent=False, post=True, from_scheduled=False, legacy=False, edit_hide=False, pinned=False, from_id=None, fwd_from=None, via_bot_id=None, reply_to=None, media=None, reply_markup=None, entities=[], views=382, forwards=0, replies=None, edit_date=None, post_author=None, grouped_id=None, restriction_reason=[], ttl_period=None)04-18-2022 01:25am";

$strip_message = '';
preg_match("/Message\((.*)\)/", $raw_message, $strip_message);

$split_by_comma = array_map('trim', preg_split("/,(?![^()]*\))/", $strip_message[1]));


foreach ($split_by_comma as $element) {
    $split_by_equals = explode('=', $element);
    $key = array_shift($split_by_equals);
    $value = implode('=', $split_by_equals);
    if ($key !== 'message' && $key !== 'date') {
        $message[$key] = $value;

    if ($key === 'date') {
        $date_tmp = '';
        preg_match("/\((.*)\)/", $value, $date_tmp);
        $date_split = explode(', ', $date_tmp[1]);
        $date = $date_split[0] . '-' . $date_split[1] . '-' . $date_split[2] . ' ' . $date_split[3] . ':' . $date_split[4] . ':' . $date_split[5];
        $message[$key] = $date;

    if ($key === 'message') {
        $message[$key] = str_replace("'", '', $value);


Results in:

array(28) {
  string(4) "1650"
  string(34) "PeerChannel(channel_id=1286966173)"
  string(18) "2022-4-15 17:14:25"
  string(29) "Please check your email inbox"
  string(5) "False"
  string(5) "False"
  string(5) "False"
  string(5) "False"
  string(4) "True"
  string(5) "False"
  string(5) "False"
  string(5) "False"
  string(5) "False"
  string(4) "None"
  string(4) "None"
  string(4) "None"
  string(4) "None"
  string(4) "None"
  string(4) "None"
  string(2) "[]"
  string(3) "382"
  string(1) "0"
  string(4) "None"
  string(4) "None"
  string(4) "None"
  string(4) "None"
  string(2) "[]"
  string(4) "None"
  •  Tags:  
  • php
  • Related