So I have downloaded Facebook Messenger messages which are a couple of json files containing lots of information. Here's a snippet of the json:
"messages": [
{
"sender_name": "sample name",
"timestamp_ms": 1649215459023,
"content": "sample message",
"reactions": [
{
"reaction": "\u00f0\u009f\u0098\u0086",
"actor": "actor name"
}
],
"type": "Generic",
"is_unsent": false
}
]
What I want to do is read this json and later create a dataframe with it, but since all the non-ASCII characters have been replaced with the \u00f0\u009f\u0098\u0086
type of code, it's not recognizing it as an emoji for example.
My question is, what do I need to do to be able to actually see those emojis as is, intead of those codes? I thought about using regex to find all of those patterns, but I don't know with what exactly I can replace them.
CodePudding user response:
Yes, I encountered the same problem when trying to decode a Facebook message dump. Here's how I solved it:
string = "\u00f0\u009f\u0098\u0086".encode("latin-1").decode("utf-8")
# '