I am trying to get facebook data for a business I run, and put it into a pandas dataframe. Some posts have comments and others do not, and I am trying to get a dataframe from it.
The JSON I have is this:
{'data': [{'id': 'user_id_post_id1'},
{'id': 'user_id_post_id2'},
{'id': 'user_id_post_id3'},
{'comments': {'data': [{'created_time': '2022-11-09T00:15:29 0000',
'message': 'comment_id',
'id': 'user_who_commented_the_id_comment_id'}]},
'id': 'user_id_post_id4'},
{'id': 'user_id_post_id5'}...]}
I am trying to get a pandas df that looks like this:
df = pd.DataFrame(data = data)
print(df)
0 User ID and Post ID comment Commenter_id
1 user_id_post_id 0 or N/A 0 or N/A
2 user_id_post_id1 0 or N/A 0 or N/A
2 user_id_post_id2 0 or N/a 0 or N/A
3 user_id_post_id3 Comment_id user_who_commented_the_id_comment_id
4 user_id_post_id3 Comment_id* user_who_commented_the_id_comment_id
2 user_id_post_id4 0 or N/a 0 or N/A
* means another comment under the same User ID and Post ID
And so on
I know how to do it when there is no double nested json, but having trouble trying to append it over. Have tried this command and to no avail.
df = pd.json_normalize(data=JSON_Name["data"]["comments"])
and get this as the return value:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_1/FileName.py in <module>
----> 1 df = pd.json_normalize(data=basic_insight["data"]["comments"])
TypeError: list indices must be integers or slices, not str
Any help would be appericated!
CodePudding user response:
Try:
data = {
"data": [
{"id": "user_id_post_id1"},
{"id": "user_id_post_id2"},
{"id": "user_id_post_id3"},
{
"comments": {
"data": [
{
"created_time": "2022-11-09T00:15:29 0000",
"message": "comment_id",
"id": "user_who_commented_the_id_comment_id",
}
]
},
"id": "user_id_post_id4",
},
{"id": "user_id_post_id5"},
]
}
tmp = [
{
"User ID and Post ID": d["id"],
"Commenter_id": d.get("comments", {}).get("data"),
}
for d in data["data"]
]
df = pd.DataFrame(tmp).explode("Commenter_id")
df["comment"] = df["Commenter_id"].str["message"]
df["Commenter_id"] = df["Commenter_id"].str["id"]
print(df)
Prints:
User ID and Post ID Commenter_id comment
0 user_id_post_id1 None None
1 user_id_post_id2 None None
2 user_id_post_id3 None None
3 user_id_post_id4 user_who_commented_the_id_comment_id comment_id
4 user_id_post_id5 None None