Home > Mobile >  How to parse Telegram message by comma, but keep commas that are in parenthesis
How to parse Telegram message by comma, but keep commas that are in parenthesis

Time:04-19

I have a PHP script that parses a telegram history message. It is comma separated, but grouped by parenthesis. I have tried a lot of things. Here is my current code that does not parse correctly.

$message = "Message(id=1650, peer_id=PeerChannel(channel_id=1286966173), date=datetime.datetime(2022, 4, 15, 17, 14, 25, tzinfo=datetime.timezone.utc), message='Please check your email inbox', out=False, mentioned=False, media_unread=False, silent=False, post=True, from_scheduled=False, legacy=False, edit_hide=False, pinned=False, from_id=None, fwd_from=None, via_bot_id=None, reply_to=None, media=None, reply_markup=None, entities=[], views=382, forwards=0, replies=None, edit_date=None, post_author=None, grouped_id=None, restriction_reason=[], ttl_period=None)04-18-2022 01:25am";
$keywords = preg_split("/,(?![^(] \))/", $message);
foreach($keywords as $value){
    echo "$value<hr>";
}

I am looking to parse the date and message.

CodePudding user response:

You can chain a couple of regexes together and do some data transformation to get the result as a PHP array.

This output looks like it has been printed with Python, instead of using a script to parse the python dump, I'd recommend modifying the python script to output something easier to parse.

$raw_message = "Message(id=1650, peer_id=PeerChannel(channel_id=1286966173), date=datetime.datetime(2022, 4, 15, 17, 14, 25, tzinfo=datetime.timezone.utc), message='Please check your email inbox', out=False, mentioned=False, media_unread=False, silent=False, post=True, from_scheduled=False, legacy=False, edit_hide=False, pinned=False, from_id=None, fwd_from=None, via_bot_id=None, reply_to=None, media=None, reply_markup=None, entities=[], views=382, forwards=0, replies=None, edit_date=None, post_author=None, grouped_id=None, restriction_reason=[], ttl_period=None)04-18-2022 01:25am";

$strip_message = '';
preg_match("/Message\((.*)\)/", $raw_message, $strip_message);

$split_by_comma = array_map('trim', preg_split("/,(?![^()]*\))/", $strip_message[1]));

$message=[];

foreach ($split_by_comma as $element) {
    $split_by_equals = explode('=', $element);
    $key = array_shift($split_by_equals);
    $value = implode('=', $split_by_equals);
    if ($key !== 'message' && $key !== 'date') {
        $message[$key] = $value;
        continue;
    }

    if ($key === 'date') {
        $date_tmp = '';
        preg_match("/\((.*)\)/", $value, $date_tmp);
        $date_split = explode(', ', $date_tmp[1]);
        $date = $date_split[0] . '-' . $date_split[1] . '-' . $date_split[2] . ' ' . $date_split[3] . ':' . $date_split[4] . ':' . $date_split[5];
        $message[$key] = $date;
    }

    if ($key === 'message') {
        $message[$key] = str_replace("'", '', $value);
    }
}

var_dump($message);

Results in:

array(28) {
  ["id"]=>
  string(4) "1650"
  ["peer_id"]=>
  string(34) "PeerChannel(channel_id=1286966173)"
  ["date"]=>
  string(18) "2022-4-15 17:14:25"
  ["message"]=>
  string(29) "Please check your email inbox"
  ["out"]=>
  string(5) "False"
  ["mentioned"]=>
  string(5) "False"
  ["media_unread"]=>
  string(5) "False"
  ["silent"]=>
  string(5) "False"
  ["post"]=>
  string(4) "True"
  ["from_scheduled"]=>
  string(5) "False"
  ["legacy"]=>
  string(5) "False"
  ["edit_hide"]=>
  string(5) "False"
  ["pinned"]=>
  string(5) "False"
  ["from_id"]=>
  string(4) "None"
  ["fwd_from"]=>
  string(4) "None"
  ["via_bot_id"]=>
  string(4) "None"
  ["reply_to"]=>
  string(4) "None"
  ["media"]=>
  string(4) "None"
  ["reply_markup"]=>
  string(4) "None"
  ["entities"]=>
  string(2) "[]"
  ["views"]=>
  string(3) "382"
  ["forwards"]=>
  string(1) "0"
  ["replies"]=>
  string(4) "None"
  ["edit_date"]=>
  string(4) "None"
  ["post_author"]=>
  string(4) "None"
  ["grouped_id"]=>
  string(4) "None"
  ["restriction_reason"]=>
  string(2) "[]"
  ["ttl_period"]=>
  string(4) "None"
}
  •  Tags:  
  • php
  • Related