Home > OS >  Parse comma-separated text between parentheses as array of key-value pairs
Parse comma-separated text between parentheses as array of key-value pairs

Time:10-20

I am trying to parse 1 line that is constructed in this format:

Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)

I have this working perfectly in C# using named capture groups, but this is PHP and strictly on topic. So I have no idea how to separate each field and build a associative array I can iterate.

I can retrieve the first item in double-quotes "textfile1.txt" using

$string = 'Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)';
preg_match("/(?:(?:\"(?:\\\\\"|[^\"]) \")|(?:'(?:\\\'|[^']) '))/is", $string, $match);
print_r($match);
Array
(
    [0] => 'textfile1.txt'
)

I cant figure it out. I have tried different expressions to consider both the string/long fields but no luck.

Is there something I am missing?

End result is having each filename/size added to a array to access later.

Any help is appreciated

https://regex101.com/r/naSdng/1

My C# implementation looks like this:

MatchCollection result = Regex.Matches(file, @"(?:\G(?!\A)\s*,\s*|\w \()(?:""(?<filename>.*?)""|'(?<filename>.*?)')\s*,\s*(?<filesize>\d )");
matchCol = result;
foreach (Match match in result)
{
    ListViewItem ItemArray = new(new string[] {
        match.Groups["filename"].Value.Trim(), BytesToReadableString(Convert.ToInt64(match.Groups["filesize"].Value)), "Ready"
    });
    fileList.Items.Add(ItemArray);
}

CodePudding user response:

The regex you have shown in C# can be easily adapted to work in PHP as well.

You may use:

(?:\w \(\h*|(?<!\A)\G\h*,\h*)"(?<filename>[^"] )"\h*,\h*(?<filesize>\d )

Note that I have refactored your regex a bit to make it more efficient.

RegEx Demo

Code Demo

Code:

<?php
$s = 'Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)';

if (preg_match_all('/(?:\w \(\h*|(?<!\A)\G\h*,\h*)"(?<filename>[^"] )"\h*,\h*(?<filesize>\d )/', $s, $m)) {
   $out = array_combine ( $m['filename'], $m['filesize'] );
   print_r($out);
}
?>

Output:

Array
(
    [textfile1.txt] => 7268474425
    [textfile2.txt] => 661204928
    [textfile3.txt] => 121034
)

RegEx Details:

  • (?:: Start a non-capture group
    • \w \(\h*: Match 1 word characters followed by ( and 0 or more whitespaces
    • |: OR
    • (?<!\A)\G: Start matching from end of the previous match
    • \h*,\h*: Match comma surrounded with 0 or more whitespaces
  • ): End non-capture group
  • "(?<filename>[^"] )": Match double quoted string with named capture group filename to match 1 of any char that is not a "
  • \h*,\h*: Match comma surrounded with 0 or more whitespaces
  • (?<filesize>\d ): Named capture group filesize to match 1 digits

CodePudding user response:

Convert the input string into a valid json string and decode it to ensure that the numeric values are cast as integers. Chunk the flat array into pairs and assign each pair as an associative element in to the result array.

Code: (Demo)

var_export(
    array_reduce(
        array_chunk(
            json_decode('[' . substr($string, 6, -1) . ']'),
            2
        ),
        function ($result, $row) {
            $result[$row[0]] = $row[1];
            return $result;
        }
    )
);

or split the inner text on every second comma-space and parse the comma-separated strings with sscanf().

Code: (Demo)

var_export(
    array_reduce(
        preg_split('/[^,] ,[^,] \K, /', substr($string, 6, -1)),
        function ($result, $string) {
            [$key, $result[$key]] = sscanf($string, '"%[^"]", %d');
            return $result;
        }
    )
);

or use preg_match_all() with the \G (continue metacharacter) then pair up the results in a foreach() so that you can explicitly cast the numbers as int-type values.

Code: (Demo)

$result = [];
preg_match_all('/(?:^\w \(|\G, )"([^"] )", (\d )/', $string, $matches, PREG_SET_ORDER);
foreach ($matches as [1 => $key, 2 => $val]) {
    $result[$key] = (int) $val;
}
var_export($result);

or iterate over each individual value after exploding the content inside of the parentheses. Then toggle the usage of the given string to determine keys and values.

Code: (Demo)

$result = [];
foreach (explode(', ', substr($string, 6, -1)) as $val) {
    if (!isset($key)) {
        $key = trim($val, '"');
    } else {
        $result[$key] = (int) $val;
        unset($key);
    }
}
var_export($result);
  • Related