I have the following code that get the submissions and comments from a subreddit tennis
:
headlines = {}
comments = []
i = 1
for submission in reddit.subreddit('tennis').search("Djokovic loves", sort="relevance", limit=10):
h = {}
c = {}
h['title'] = submission.title
h['id'] = submission.id
h['score'] = submission.score
submission.comments.replace_more(limit=0)
for comment in submission.comments.list():
c['author'] = comment.author
c['body'] = comment.body
comments.append(c)
headlines['headline ' str(i)] = h
headlines['comments ' str(i)] = comments
i = 1
So the output would be something like:
{'headline 1':
{
'title': 'abc',
'id': 123,
'score': 0.5
},
'comment 1': [{'author': 'James', 'body': 'He is good!'}]
}
Is there a way to parse this structure into a Pandas dataframe, or do you suggest any other way (data structures) to store both submissions and the comments?
Thank you!
CodePudding user response:
data = [{'headline': {'title': 'abc', 'id':123, 'score':0.5},'comment': [{'author': 'James', 'body': 'He is good!'}]}]
df = pd.json_normalize(data, record_path='comment', meta=[['headline', 'title'],['headline', 'id'], ['headline','score']], record_prefix='comment.')
| | comment.author | comment.body | headline.title | headline.id | headline.score |
|---:|:-----------------|:---------------|:-----------------|--------------:|-----------------:|
| 0 | James | He is good! | abc | 123 | 0.5 |