Home > other >  How to convert a string of utf-8 bytes into a unicode emoji in python
How to convert a string of utf-8 bytes into a unicode emoji in python

Time:04-08

So I have downloaded Facebook Messenger messages which are a couple of json files containing lots of information. Here's a snippet of the json:

"messages": [
{
  "sender_name": "sample name",
  "timestamp_ms": 1649215459023,
  "content": "sample message",
  "reactions": [
    {
      "reaction": "\u00f0\u009f\u0098\u0086",
      "actor": "actor name"
    }
  ],
  "type": "Generic",
  "is_unsent": false
}

]

What I want to do is read this json and later create a dataframe with it, but since all the non-ASCII characters have been replaced with the \u00f0\u009f\u0098\u0086 type of code, it's not recognizing it as an emoji for example.

My question is, what do I need to do to be able to actually see those emojis as is, intead of those codes? I thought about using regex to find all of those patterns, but I don't know with what exactly I can replace them.

CodePudding user response:

Yes, I encountered the same problem when trying to decode a Facebook message dump. Here's how I solved it:

string = "\u00f0\u009f\u0098\u0086".encode("latin-1").decode("utf-8")
# '           
  • Related