Home > Net >  Rails JSON Import - unicode characters detected
Rails JSON Import - unicode characters detected

Time:05-06

I have thousands json files created from a old php application that will be imported into a new version being developed in rails.

In php i'm running json_encode($object) to encode the item before saving it.

Here is a edited down version of the json that is being produced. The description field is where I'm seeing the unicode character.

{ "ID": "", "parentID": "", "formID": "", "defaultProject": "", "data": { "title": "", "idno": "", "date": "2016-11-09", "creator": [ ], "contributor": [ ], "itemNumber": "", "oclcNumber": "", "publisher": "", "publisherLocation": "", "description": "<..contents removed> family\u00e2\u0080\u0099s land <..contents removed> \r\n", "subject": [ ], "type": "", "provenanceDpla": "", "rights": "", "location": [ ], "timePeriod": "", "format": [ ], "language": [ ], "source": "", "extent": "" } }, "metadata": "", "idno": "", "modifiedTime": "", "createTime": "", "modifiedBy": "", "createdBy": "", "publicRelease": "" }

The part that we are having issues with is in the description field. The original looks like.

enter image description here

When I view the imported record this part looks like.

garbage characters

Inspecting the item in the rails console that looks like this.

enter image description here

I'm using @hash = JSON.parse(File.read(file)) does anyone have a good recommendation on how to handle this. I'm sure we will find this more as we work on exporting the content.

CodePudding user response:

In the php export I was running $utf_encoded = mb_convert_encoding( $item, 'UTF-8' ); to insure that everything was encoded as utf-8 but for some reason this was producing the above result. Removing this gave me a unicode of \u2019 instead of \u00e2\u0080\u0099 which importing into the new rails app worked correctly.

  • Related