I am trying to removing some strange printing characters that are in several files, the contents of these files have been pulled into a PHP string.
I have tried using preg_replace
to remove the strange printing characters, but haven't had much success.
The strange part is the regex I used with preg_replace
does seem to work when I test it using a web based regex tester, so am confused as to why it doesn't work when I have the same regex in my PHP file.
The input data is just over 2000 lines, below is a snippet of the input data showing the þ
which is what I am wanting to remove along with the $NoCode
$800C5304 0063
$800C5306 0063
$800C5308 0063
$800C530A 0063
$800C530C 0063
$800C530E 0063
$800C5310 0063
$800C5312 0063
$800C5314 0063
$800C5316 0063
$800C5318 0063
$800C531A 0063
$800C531C 0063
þ
$NoCode
This is the regex I have tried with preg_replace
$fileData = preg_replace("/\$([A-F0-9] ) ([A-F0-9] )\n(. )\n\$NoCode/", "'\$$1 $2'", $fileData);
From the link below, the þ
seems to be or at least part of a byte order mark in UTF-16.
When I run iconv(mb_detect_encoding($fileData), 'UTF-8', $fileData);
I get:
iconv(): Detected an illegal character in input string
.
If I do iconv('UTF-16', 'UTF-8', $fileData)
instead I get:
iconv(): Detected an incomplete multibyte character in input
CodePudding user response:
So it seems the þ
was an incomplete multibyte string. I fixed this using the command below to remove the incomplete multibyte strings.
$fileData = mb_convert_encoding($fileData, 'UTF-8', 'UTF-8');
This left a ?
where the þ
originally was, I then removed this using the following.
$fileData = str_replace("\n?\n\$NoCode", '', $fileData);
CodePudding user response:
str_replace should be faster than preg_replace Here is an example:
$input = file_get_contents('input.txt');
$output = str_replace(['þ','$NoCode'], '', $input);
file_put_contents('output.txt', $output);
Or if you want get rid of empty lines too:
$input = file_get_contents('input.txt');
$output = str_replace(["þ\r\n","\$NoCode\r\n", "þ\n","\$NoCode\n", "þ\r", "\$NoCode\r"], '', $input);
file_put_contents('output.txt', $output);