I have thousands json files created from a old php application that will be imported into a new version being developed in rails.
In php i'm running json_encode($object)
to encode the item before saving it.
Here is a edited down version of the json that is being produced. The description field is where I'm seeing the unicode character.
{ "ID": "", "parentID": "", "formID": "", "defaultProject": "", "data": { "title": "", "idno": "", "date": "2016-11-09", "creator": [ ], "contributor": [ ], "itemNumber": "", "oclcNumber": "", "publisher": "", "publisherLocation": "", "description": "<..contents removed> family\u00e2\u0080\u0099s land <..contents removed> \r\n", "subject": [ ], "type": "", "provenanceDpla": "", "rights": "", "location": [ ], "timePeriod": "", "format": [ ], "language": [ ], "source": "", "extent": "" } }, "metadata": "", "idno": "", "modifiedTime": "", "createTime": "", "modifiedBy": "", "createdBy": "", "publicRelease": "" }
The part that we are having issues with is in the description field. The original looks like.
When I view the imported record this part looks like.
Inspecting the item in the rails console that looks like this.
I'm using @hash = JSON.parse(File.read(file))
does anyone have a good recommendation on how to handle this. I'm sure we will find this more as we work on exporting the content.
CodePudding user response:
In the php export I was running $utf_encoded = mb_convert_encoding( $item, 'UTF-8' );
to insure that everything was encoded as utf-8 but for some reason this was producing the above result. Removing this gave me a unicode of \u2019
instead of \u00e2\u0080\u0099
which importing into the new rails app worked correctly.