I've been trying to output the input in the fashion shown in the comment section of the code below. I'm unable to get it right however hard I try. Can you please help me correct it
<?php
header('Content-type: text/plain');
/*
input:
1,"3,4",5,6,"7,8,9",10,11
o/p:
1
3,4
5
6
7,8,9
10
11
*/
$str = '1,"3,4",5,6,"7,8,9",10,11';
//print_r(explode(",", $str));
$arr = explode(",", $str);
$end_qte = false;
$start_qte = false;
foreach($arr as $elm) {
if (stripos($elm,'"') === FALSE && $end_qte) {
echo $elm . "\n";
echo "hi";
$end_qte = false;
} else if ($start_qte) {
echo "," . $elm;
} else if (stripos($elm,'"') == 0) {
echo trim($elm,"\"");
$start_qte = true;
} else if (stripos($elm,'"') == 1) {
echo "," . trim($elm,"\"") . "\n";
$end_qte = true;
$start_qte = false;
}
}
?>
CodePudding user response:
Use my "state machine" to parse the file (based on a similar recent answer of mine in javascript)
$str = '1,"3,4",5,6,"7,8,9",10,11';
function tokenize($str)
{
$state = "normal";
$tokens = [];
$current = "";
for ($i = 0; $i < strlen($str); $i ) {
$c = $str[$i];
if ($state == "normal") {
if ($c == ',') {
if ($current) {
$tokens[] = $current;
$current = "";
}
continue;
}
if ($c == '"') {
$state = "quotes";
$current = "";
continue;
}
$current .= $c;
}
if ($state == "quotes") {
if ($c == '"') {
$state = "normal";
$tokens[] = $current;
$current = "";
continue;
}
$current .= $c;
}
}
if ($current) {
$tokens[] = $current;
$current = "";
}
return $tokens;
}
$result = tokenize($str);
/*
Array
(
[0] => 1
[1] => 3,4
[2] => 5
[3] => 6
[4] => 7,8,9
[5] => 10
[6] => 11
)
*/
CodePudding user response:
Simple with using a regex:
$str = '1,"3,4",5,6,"7,8,9",10,11';
$matches = [];
preg_match_all('/(".*?")|([^,] )/', $str, $matches);
$withoutQuotes = array_map(fn($e) => str_replace('"', '', $e), $matches[0]);
echo implode("\n", $withoutQuotes);
gives
1
3,4
5
6
7,8,9
10
11
CodePudding user response:
There are a number of ways to parse this.
But, I think it's a mistake to split up the string on commas, even if they are inside a quoted string. Doing so makes it difficult to differentiate between commas that are delimiters vs commas inside quotes (which are just ordinary characters)
I'd split things up by looking for the leftmost comma or quote.
I think the easiest way is with regexes.
Since you originally used many language tags, here's a perl
solution:
#!/usr/bin/perl
# split -- split up string
master(@ARGV);
exit(0);
# master -- master control
sub master
{
$opt_d = $ENV{"DEBUG"} != 0;
$str = '1,"3,4",5,6,"7,8,9",10,11';
while ($str ne "") {
dbgprt("master: LOOP str='%s'\n",$str);
# add ordinary text
if ($str =~ s/^([^,"] )//) {
$out = $1;
dbgprt("master: NORM out='%s' str='%s'\n",$out,$str);
push(@out,$out);
next;
}
# get the delimiter: either comma or quote
$str =~ s/^(.)//;
$out = $1;
dbgprt("master: DLM out='%s' str='%s'\n",$out,$str);
# handle a quoted string
if ($out eq '"') {
# get all leading non-quote chars
$str =~ s/([^"] )//;
$out = $1;
push(@out,$out);
# strip trailing quote
$str =~ s/^"//;
dbgprt("master: QUO out='%s' str='%s'\n",$out,$str);
}
}
foreach $out (@out) {
printf("%s\n",$out);
}
}
sub dbgprt
{
printf(@_)
if ($opt_d);
}
Here is the debug output:
master: LOOP str='1,"3,4",5,6,"7,8,9",10,11'
master: NORM out='1' str=',"3,4",5,6,"7,8,9",10,11'
master: LOOP str=',"3,4",5,6,"7,8,9",10,11'
master: DLM out=',' str='"3,4",5,6,"7,8,9",10,11'
master: LOOP str='"3,4",5,6,"7,8,9",10,11'
master: DLM out='"' str='3,4",5,6,"7,8,9",10,11'
master: QUO out='3,4' str=',5,6,"7,8,9",10,11'
master: LOOP str=',5,6,"7,8,9",10,11'
master: DLM out=',' str='5,6,"7,8,9",10,11'
master: LOOP str='5,6,"7,8,9",10,11'
master: NORM out='5' str=',6,"7,8,9",10,11'
master: LOOP str=',6,"7,8,9",10,11'
master: DLM out=',' str='6,"7,8,9",10,11'
master: LOOP str='6,"7,8,9",10,11'
master: NORM out='6' str=',"7,8,9",10,11'
master: LOOP str=',"7,8,9",10,11'
master: DLM out=',' str='"7,8,9",10,11'
master: LOOP str='"7,8,9",10,11'
master: DLM out='"' str='7,8,9",10,11'
master: QUO out='7,8,9' str=',10,11'
master: LOOP str=',10,11'
master: DLM out=',' str='10,11'
master: LOOP str='10,11'
master: NORM out='10' str=',11'
master: LOOP str=',11'
master: DLM out=',' str='11'
master: LOOP str='11'
master: NORM out='11' str=''
1
3,4
5
6
7,8,9
10
11