I am writing a PHP file that takes the contents of a web page, filters for full-width numbers, and converts them to half-width. Currently, my program returns all full-width characters on the page, not just the numbers.
<?php
$fullwidthPattern = '/([0-9])/';
$handle = curl_init();
$url = (URL removed for privacy reasons);
function getFullWidth(string $input) {
global $fullwidthPattern;
return preg_match($fullwidthPattern, $input);
}
curl_setopt($handle, CURLOPT_URL, $url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($handle);
curl_close($handle);
function jp_str_split($str) {
$pattern = '/(?<!^)(?!$)/u';
return preg_split($pattern,$str);
}
$jpContents = jp_str_split($output);
$numbers = array_filter($jpContents, 'getFullWidth');
foreach($numbers as $x) {
echo $x;
}
My regular expression is currently '/([0-9])/', but I have also tried '/[0-9]/' and '/[0123456789]/'.
CodePudding user response:
Splitting should be done with
function jp_str_split($str) {
preg_match_all('/\X/u', $str, $matches);
return $matches[0];
}
The \X
construct matches any Unicode grapheme in full, your (?<!^)(?!$)
regex matches any location inside the string, even between bytes regardless of the u
flag presence (it affects the chars you consume and not the locations inside the matched string).
Also, since you process Unicode numbers, you must also pass the u
flag in the second regex:
$fullwidthPattern = '/([0-9])/u';