Home > Mobile >  Best way to parse nested dictionary in Pandas?
Best way to parse nested dictionary in Pandas?

Time:12-01

I have the following code that get the submissions and comments from a subreddit tennis:

headlines = {}
comments = []
i = 1
for submission in reddit.subreddit('tennis').search("Djokovic loves", sort="relevance", limit=10):
    h = {}
    c = {}
    h['title'] = submission.title
    h['id'] = submission.id
    h['score'] = submission.score
    submission.comments.replace_more(limit=0)
    for comment in submission.comments.list():
        c['author'] = comment.author
        c['body'] = comment.body
        comments.append(c)
    headlines['headline '   str(i)] = h
    headlines['comments '   str(i)] = comments
    i  = 1

So the output would be something like:

{'headline 1': 
    {
        'title': 'abc', 
        'id': 123, 
        'score': 0.5
    }, 
    'comment 1': [{'author': 'James', 'body': 'He is good!'}]
}

Is there a way to parse this structure into a Pandas dataframe, or do you suggest any other way (data structures) to store both submissions and the comments?

Thank you!

CodePudding user response:

data = [{'headline': {'title': 'abc', 'id':123, 'score':0.5},'comment': [{'author': 'James', 'body': 'He is good!'}]}]
        
df = pd.json_normalize(data, record_path='comment', meta=[['headline', 'title'],['headline', 'id'], ['headline','score']], record_prefix='comment.')
       
    |    | comment.author   | comment.body   | headline.title   |   headline.id |   headline.score |
    |---:|:-----------------|:---------------|:-----------------|--------------:|-----------------:|
    |  0 | James            | He is good!    | abc              |           123 |              0.5 |
  • Related