Home > Software engineering >  How do I to export utf8mb4 mysql data using php to a csv file
How do I to export utf8mb4 mysql data using php to a csv file

Time:06-24

I am looking for help on a csv file export.

I have a mysql database encoded as utf8mb4 (unicode_ci) with a table using collation utf8mb4_unicode_ci for my fields. The data contains special characters such as copyright symbols, foreign characters such as "é", etc. I am trying to export data to a csv file but the string values that contain special characters are not translating over properly. For example, the copyright symbol comes up as "¬Æ" in the csv file I generate.

My environment is Laravel 7, PHP 7 and MySQL 5.7 on Ubunutu 18.0.4. My database connection is already setup as charset = "utf8mb4" and collation = "utf8mb4_unicode_ci" in my Laravel database config file. The meta tag in my page header is already set to use charset=utf-8 and the header used to generate the csv file is set to:

header('Content-Type: text/csv; charset=utf-8');

I have tried using:

iconv("utf-8", "ascii//TRANSLIT//IGNORE", $mystring);

but this only replaces some of the values with ascii representations and not the proper symbols. I have also tried using something like

htmlspecialchars($mystring, ENT_QUOTES, "UTF-8"); 

but this still returns "®" for the copyright symbol and other strange character sequences in the csv file. When I echo the values in php, they appear correctly on my page. Am I right in thinking that I need to somehow convert the utf8mb4 string to regular utf-8 when I append the row to my csv file? I have not been able to find a solution and am looking for some help.

Can anyone tell me what I need to do to get the expected symbols in my csv file?

CodePudding user response:

Jerry's comment

You don't show the code you use to actually write the file. Also, you don't say how you're inspecting the result (if you are using Excel, that could be the problem).

and Sammitch's comment

It's not that the data is not exporting properly, it's that the program that is reading or displaying it is not using the correct charset. You can try adding a UTF8 BOM \xEF\xBB\xBF to the beginning of the file and the program may use that as a signal to apply the correct charset. Failing that, look up how to open UTF8 CSVs properly in that program. Failing that you'll need to translate the data to a charset that the program does handle correctly.

were helpful. I was using Excel to preview the file. When I looked at the raw csv data in a code editor, the expected characters are there so it is something with the way Excel handles the file. Since I am working on a Mac and the © symbol is being entered with [Option] [G], the é is [Option] [E], etc. it would make sense that it could be a translation problem with how Excel reads the file. Adding \xEF\xBB\xBF to the beginning of the file seems to have done the trick!

CodePudding user response:

  • If you stored utf8 values into a column declared latin1, fix that first.
  • Do not use any conversion routines.
  • Do verify the data in the tables using SELECT(hex) and SHOW CREATE TABLE

More: Trouble with UTF-8 characters; what I see is not what I stored

  • Related