Home > Software design >  How to get a Pandas dataframe from instagram JSON data
How to get a Pandas dataframe from instagram JSON data

Time:06-24

I am quite new to all this, I took a short Python bootcamp a while back and am now struggling to get some Instagram data into a format I understand.

Using the following code:

# Importing packages
import json
import re
import collections
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

# Loading downloaded instagram data
json_data = {}
data_path = "C:/Users/etc.json"
with open(data_path) as file:
    json_data = json.load(file)

print(json_data)

I get the following output which looks promising:

{'relationships_followers': [{'title': '', 'media_list_data': [], 'string_list_data': [{'href': 'https://www.instagram.com/username1', 'value': 'username1', 'timestamp': 1655411505}]}, {'title': '', 'media_list_data': [], 'string_list_data': [{'href': 'https://www.instagram.com/username2', 'value': 'username2', 'timestamp': 1655149264}]}, {'title': '', 'media_list_data': [], 'string_list_data': [{'href': 'https://www.instagram.com/username3', 'value': 'username3', 'timestamp': 1655129904}]}, etc.....

type = dict

But when I try to convert it into a pandas dataframe it presents strangely

dfp = pd.read_json(data_path, orient = 'records')
print(dfp)
print(type(dfp))

Output:

                               relationships_followers
0    {'title': '', 'media_list_data': [], 'string_l...
1    {'title': '', 'media_list_data': [], 'string_l...
2    {'title': '', 'media_list_data': [], 'string_l...
3    {'title': '', 'media_list_data': [], 'string_l...
4    {'title': '', 'media_list_data': [], 'string_l...
..                                                 ...
575  {'title': '', 'media_list_data': [], 'string_l...
576  {'title': '', 'media_list_data': [], 'string_l...
577  {'title': '', 'media_list_data': [], 'string_l...
578  {'title': '', 'media_list_data': [], 'string_l...
579  {'title': '', 'media_list_data': [], 'string_l...

[580 rows x 1 columns]
<class 'pandas.core.frame.DataFrame'>

How do I stop taking "relationships_followers" as a lonely column?

Trying to get an output like the below:

         href             value          timestamp
0        www.inst...      username1      DDMMYY
1        www.inst...      username2      DDMMYY
2        www.inst...      username3      DDMMYY
3        www.inst...      username4      DDMMYY
...
578      www.inst...      username578    DDMMYY
579      www.inst...      username579    DDMMYY

CodePudding user response:

Try doing this to your master dict.

worthy_data = json_data.get('relationship_followers')

wanted_dicts = [k:v for (k,v) in worthy_data.items() if k == 'string_list_data']

pd.DataFrame(wanted_dicts)

CodePudding user response:

In this case you can use pd.json_normalize() to extract the href, value, timestamp columns from the string_list_data dictionary.

pd.json_normalize(json_data['relationships_followers'], 'string_list_data')

# Output :
#                                   href      value   timestamp
# 0  https://www.instagram.com/username1  username1  1655411505
# 1  https://www.instagram.com/username2  username2  1655149264
# 2  https://www.instagram.com/username3  username3  1655129904
  • Related