using the module facebook_scraper
in Python I would like to extract the text of Facebook comments of posts to conduct a sentiment analysis of a certain page.
With the following usage of the built-in function get_posts
,
from facebook_scraper import get_posts
import pandas as pd
for post in get_posts('PAGE_NAME', extra_info=True, pages=50, options={"comments": True}):
post_entry = post
fb_post_df = pd.DataFrame.from_dict(post_entry, orient='index')
fb_post_df = fb_post_df.transpose()
post_df_full = post_df_full.append(fb_post_df)
print(post['post_id'] ' get')
it's possible to scrape the post information into the dataframe fb_post_df
which looks like this (condensed version with only the relevant columns, since function returns df with 50 columns):
post_id | text | ... | comments_full |
---|---|---|---|
12345 | 'text of the post' | ... | [{'comment_id': '12345', 'comment_url': 'https://facebook.com/12345', 'commenter_id': '12345', 'commenter_url': None, 'commenter_name': 'Jane Doe', 'commenter_meta': None, 'comment_text': 'THIS PIECE I NEED, TEXT OF THE COMMENT', 'comment_time': 2022-02-23 10:01:38, 'comment_image': None, 'comment_reactors': [], 'comment_reactions': None, 'comment_reaction_count': None, 'replies': []}] |
The dtype
of the column comments_full is an object.
I've tried using pandas' from_dict
to generate a new dataframe solely consisting of the comment texts, but it seems to fail to identify the contents of the column as a dictionary - since it is a list of dictionaries (if that makes sense).
Please note, that the column can be empty if a post has no comments, in this case the content of the column looks as such:
[]
CodePudding user response:
List comprehension should do the trick:
post_df_full['comments_full'].apply(lambda x: [y['comment_text'] for y in x] if x else 'no comment')