I am trying to obtain replies to the comments on Threads. Here is what I have been able to accomplish by parsing JSON:
subreddit = 'wallstreetbets'
link = 'https://oauth.reddit.com/r/' subreddit '/hot'
hot = requests.get(link,headers = headers)
hot.json()
Here is output
{'kind': 'Listing',
'data': {'after': 't3_x8kidp',
'dist': 27,
'modhash': None,
'geo_filter': None,
'children': [{'kind': 't3',
'data': {'approved_at_utc': None,
'subreddit': 'wallstreetbets',
'selftext': '**Read [rules](https://www.reddit.com/r/wallstreetbets/wiki/contentguide), follow [Twitter](https://twitter.com/Official_WSB) and [IG](https://www.instagram.com/official_wallstreetbets/), join [Discord](https://discord.gg/wallstreetbets), see [ban bets](https://www.reddit.com/r/wallstreetbets/wiki/banbets)!**\n\n[dm mods because why not](https://www.reddit.com/message/compose/?to=/r/wallstreetbets)\n\n[Earnings Thread](https://wallstreetbets.reddit.com/x4ryjg)',
'author_fullname': 't2_bd6q5',
'saved': False,
'mod_reason_title': None,
'gilded': 0,
'clicked': False,
'title': 'What Are Your Moves Tomorrow, September 08, 2022',
'link_flair_richtext': [{'e': 'text', 't': 'Daily Discussion'}],
'subreddit_name_prefixed': 'r/wallstreetbets',
'hidden': False,
'pwls': 7,
'link_flair_css_class': 'daily',
'downs': 0,
'thumbnail_height': None,
'top_awarded_type': None,
'hide_score': False,
'name': 't3_x8ev67',
...
'created_utc': 1662594703.0,
'num_crossposts': 0,
'media': None,
'is_video': False}}],
'before': None}}
I then turned it into a data frame
df = pd.DataFrame()
for post in hot.json()['data']['children']:
df = df.append({
'subreddit' : post['data']['subreddit'],
'title': post['data']['title'],
'selftext': post['data']['selftext'],
'created_utc': post['data']['created_utc'],
'id': post['data']['id']
}, ignore_index = True)
With this, I was able to obtain a data frame like thisDataFrame
Then, to obtain the comments, I created a list with all the JSON script from the 26 posts, and then created a while loop to iterate through the json script.
supereme = len(list_of_comments)
indexy = pd.DataFrame()
while supereme > 0:
supereme -= 1
for g in range(0,len(list_of_comments[supereme]['data']['children'])-1):
indexy = pd.concat([indexy, pd.DataFrame.from_records([{
'body': list_of_comments[supereme]['data']['children'][g]['data']['body'],
'post_id': list_of_comments[supereme]['data']['children'][g]['data']['parent_id'] }])], ignore_index = True)
indexy
This gave me this: DataFrame However, I am not able to obtain the replies to the comments. Any help? I tried to do this
posts = 26
for i in np.arange(0,27):
print('i',i)
if len(list_of_comments[i]['data']['children']) == 0:
continue
for j in np.arange(0,len(list_of_comments[i]['data']['children'])):
if len(list_of_comments[i]['data']['children'][j]['data']['replies']) == 0:
break
else:
print('j',len(list_of_comments[i]['data']['children'][j]['data']['replies']))
for z in np.arange(len(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children'])):
if len(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children']) == 0:
break
print('z',z)
print(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'])
The first loop kinda works but it doesn't count up properly to get all the replies to all the posts itll only pull like one or two. We don't want to use PRAW
CodePudding user response:
x=len(list_of_comments)
replies = pd.DataFrame()
for i in range(0,len(list_of_comments)):
try:
for j in range(0, len(list_of_comments[x]['data']['children'])):
try:
for z in range(0, len(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'])):
#print(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'])
#print(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['link_id'])
replies = pd.concat([replies, pd.DataFrame.from_records([{
'body': list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'],
'post_id': list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['link_id']
}])], ignore_index = True)
except:
pass
except:
continue