Home > Back-end >  Python Reddit API JSON issues (no PRAW)
Python Reddit API JSON issues (no PRAW)

Time:09-10

I am trying to obtain replies to the comments on Threads. Here is what I have been able to accomplish by parsing JSON:

subreddit =  'wallstreetbets'
link = 'https://oauth.reddit.com/r/' subreddit '/hot'
hot = requests.get(link,headers = headers)
hot.json()

Here is output

{'kind': 'Listing',
 'data': {'after': 't3_x8kidp',
  'dist': 27,
  'modhash': None,
  'geo_filter': None,
  'children': [{'kind': 't3',
    'data': {'approved_at_utc': None,
     'subreddit': 'wallstreetbets',
     'selftext': '**Read [rules](https://www.reddit.com/r/wallstreetbets/wiki/contentguide), follow [Twitter](https://twitter.com/Official_WSB) and [IG](https://www.instagram.com/official_wallstreetbets/), join [Discord](https://discord.gg/wallstreetbets), see [ban bets](https://www.reddit.com/r/wallstreetbets/wiki/banbets)!**\n\n[dm mods because why not](https://www.reddit.com/message/compose/?to=/r/wallstreetbets)\n\n[Earnings Thread](https://wallstreetbets.reddit.com/x4ryjg)',
     'author_fullname': 't2_bd6q5',
     'saved': False,
     'mod_reason_title': None,
     'gilded': 0,
     'clicked': False,
     'title': 'What Are Your Moves Tomorrow, September 08, 2022',
     'link_flair_richtext': [{'e': 'text', 't': 'Daily Discussion'}],
     'subreddit_name_prefixed': 'r/wallstreetbets',
     'hidden': False,
     'pwls': 7,
     'link_flair_css_class': 'daily',
     'downs': 0,
     'thumbnail_height': None,
     'top_awarded_type': None,
     'hide_score': False,
     'name': 't3_x8ev67',
...
     'created_utc': 1662594703.0,
     'num_crossposts': 0,
     'media': None,
     'is_video': False}}],
  'before': None}}

I then turned it into a data frame

df = pd.DataFrame()
for post in hot.json()['data']['children']:
    df = df.append({
        'subreddit' : post['data']['subreddit'],
        'title': post['data']['title'],
        'selftext': post['data']['selftext'],
        'created_utc': post['data']['created_utc'],
        'id': post['data']['id']

      

    }, ignore_index = True)

With this, I was able to obtain a data frame like thisDataFrame

Then, to obtain the comments, I created a list with all the JSON script from the 26 posts, and then created a while loop to iterate through the json script.

supereme = len(list_of_comments)
indexy = pd.DataFrame()
while supereme > 0:
    supereme -= 1
    for g in range(0,len(list_of_comments[supereme]['data']['children'])-1):
        indexy = pd.concat([indexy, pd.DataFrame.from_records([{
     'body': list_of_comments[supereme]['data']['children'][g]['data']['body'],
     'post_id': list_of_comments[supereme]['data']['children'][g]['data']['parent_id'] }])], ignore_index = True)

      

  
indexy

This gave me this: DataFrame However, I am not able to obtain the replies to the comments. Any help? I tried to do this

posts = 26 
for i in np.arange(0,27):
    print('i',i)
    if len(list_of_comments[i]['data']['children']) == 0:
        continue
    for j in np.arange(0,len(list_of_comments[i]['data']['children'])):
        if len(list_of_comments[i]['data']['children'][j]['data']['replies']) == 0:
            break
        else: 
            print('j',len(list_of_comments[i]['data']['children'][j]['data']['replies']))
            for z in np.arange(len(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children'])):
                if len(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children']) == 0:
                    break
                print('z',z)


                print(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'])

The first loop kinda works but it doesn't count up properly to get all the replies to all the posts itll only pull like one or two. We don't want to use PRAW

CodePudding user response:

x=len(list_of_comments)
replies = pd.DataFrame()
for i in range(0,len(list_of_comments)):
    try:
        for j in range(0, len(list_of_comments[x]['data']['children'])):  
            try: 
                for z in range(0, len(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'])):
                    #print(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'])
                    #print(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['link_id'])
                    replies = pd.concat([replies, pd.DataFrame.from_records([{
        'body': list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'],
        'post_id': list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['link_id']

      

    }])], ignore_index = True)
                    
            except:
                pass
    except:
        continue
  • Related