Home > Mobile >  PHP Need help parsing telegram message by comma, but keep commas that are in parenthesis
PHP Need help parsing telegram message by comma, but keep commas that are in parenthesis

Time:04-18

I have a php script that parses a telegram history message. It is comma seperated, but grouped by parenthesis. I have tried a lot of things. Here is my current code that does not parse correctly.

$message = "Message(id=1650, peer_id=PeerChannel(channel_id=1286966173), date=datetime.datetime(2022, 4, 15, 17, 14, 25, tzinfo=datetime.timezone.utc), message='Please check your email inbox', out=False, mentioned=False, media_unread=False, silent=False, post=True, from_scheduled=False, legacy=False, edit_hide=False, pinned=False, from_id=None, fwd_from=None, via_bot_id=None, reply_to=None, media=None, reply_markup=None, entities=[], views=382, forwards=0, replies=None, edit_date=None, post_author=None, grouped_id=None, restriction_reason=[], ttl_period=None)04-18-2022 01:25am";
$keywords = preg_split("/,(?![^(] \))/", $message);
foreach($keywords as $value){
    echo "$value<hr>";
}

I am looking to parse the date and message.

CodePudding user response:

You can chain a couple of regexes together and do some data transformation to get the result as a PHP array.

This output looks like it has been printed with Python, instead of using a script to parse the python dump, I'd recommend modifying the python script to output something easier to parse.

$raw_message = "Message(id=1650, peer_id=PeerChannel(channel_id=1286966173), date=datetime.datetime(2022, 4, 15, 17, 14, 25, tzinfo=datetime.timezone.utc), message='Please check your email inbox', out=False, mentioned=False, media_unread=False, silent=False, post=True, from_scheduled=False, legacy=False, edit_hide=False, pinned=False, from_id=None, fwd_from=None, via_bot_id=None, reply_to=None, media=None, reply_markup=None, entities=[], views=382, forwards=0, replies=None, edit_date=None, post_author=None, grouped_id=None, restriction_reason=[], ttl_period=None)04-18-2022 01:25am";

$strip_message = '';
preg_match("/Message\((.*)\)/", $raw_message, $strip_message);

$split_by_comma = array_map('trim', preg_split("/,(?![^()]*\))/", $strip_message[1]));

$message=[];

foreach ($split_by_comma as $element) {
    $split_by_equals = explode('=', $element);
    $key = array_shift($split_by_equals);
    $value = implode('=', $split_by_equals);
    if ($key !== 'message' && $key !== 'date') {
        $message[$key] = $value;
        continue;
    }

    if ($key === 'date') {
        $date_tmp = '';
        preg_match("/\((.*)\)/", $value, $date_tmp);
        $date_split = explode(', ', $date_tmp[1]);
        $date = $date_split[0] . '-' . $date_split[1] . '-' . $date_split[2] . ' ' . $date_split[3] . ':' . $date_split[4] . ':' . $date_split[5];
        $message[$key] = $date;
    }

    if ($key === 'message') {
        $message[$key] = str_replace("'", '', $value);
    }
}

var_dump($message);

Results in:

array(28) {
  ["id"]=>
  string(4) "1650"
  ["peer_id"]=>
  string(34) "PeerChannel(channel_id=1286966173)"
  ["date"]=>
  string(18) "2022-4-15 17:14:25"
  ["message"]=>
  string(29) "Please check your email inbox"
  ["out"]=>
  string(5) "False"
  ["mentioned"]=>
  string(5) "False"
  ["media_unread"]=>
  string(5) "False"
  ["silent"]=>
  string(5) "False"
  ["post"]=>
  string(4) "True"
  ["from_scheduled"]=>
  string(5) "False"
  ["legacy"]=>
  string(5) "False"
  ["edit_hide"]=>
  string(5) "False"
  ["pinned"]=>
  string(5) "False"
  ["from_id"]=>
  string(4) "None"
  ["fwd_from"]=>
  string(4) "None"
  ["via_bot_id"]=>
  string(4) "None"
  ["reply_to"]=>
  string(4) "None"
  ["media"]=>
  string(4) "None"
  ["reply_markup"]=>
  string(4) "None"
  ["entities"]=>
  string(2) "[]"
  ["views"]=>
  string(3) "382"
  ["forwards"]=>
  string(1) "0"
  ["replies"]=>
  string(4) "None"
  ["edit_date"]=>
  string(4) "None"
  ["post_author"]=>
  string(4) "None"
  ["grouped_id"]=>
  string(4) "None"
  ["restriction_reason"]=>
  string(2) "[]"
  ["ttl_period"]=>
  string(4) "None"
}
  •  Tags:  
  • php
  • Related