Home > Net >  Sentiment analysis hashtag column conversion
Sentiment analysis hashtag column conversion

Time:03-29

So, While learning Sentiment analysis, I got a column in my dataframe which include data looks like this

Index Hashtag_info
0 {'hashtags': [{'text': 'SEVENTEEN', 'indices': [139, 149]}, {'text': 'SVT_POWER_OF_LOVE_THE_MOVIE', 'indices': [150, 178]}], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'url_text', 'expanded_url': 'url_text', 'display_url': 'url_text', 'indices': [114, 137]}], 'media': [{'id': 1505695832837804032, 'id_str': '1505695832837804032', 'indices': [179, 202], 'media_url': 'url_text', 'media_url_https': 'url_text', 'url': 'url_text', 'display_url': 'url_text', 'expanded_url': 'url_text', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 1920, 'h': 1080, 'resize': 'fit'}, 'medium': {'w': 1200, 'h': 675, 'resize': 'fit'}, 'small': {'w': 680, 'h': 383, 'resize': 'fit'}}}]}
1 {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 1505486045957300230, 'id_str': '1505486045957300230', 'indices': [264, 287], 'media_url': 'url_text', 'media_url_https': 'url_text', 'url': 'url_text', 'display_url': 'url_text', 'expanded_url': 'url_text', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 467, 'h': 680, 'resize': 'fit'}, 'large': {'w': 1408, 'h': 2048, 'resize': 'fit'}, 'medium': {'w': 825, 'h': 1200, 'resize': 'fit'}}}]}

and so on for nearly 20k row

So my question how to convert this data type of object O into something like this

Index hashtag_text Type
0 ['SEVENTEEN','SVT_POWER_OF_LOVE_THE_MOVIE'] photo
1 [] video

Like extract text and type only

CodePudding user response:

If the underlying data has no flaw, you could use loops:

pd.DataFrame({'hashtag_text': [[e['text'] for e in x['hashtags']]
                               for x in df['Hashtag_info']],
              'Type': [x['media'][0]['type'] for x in df['Hashtag_info']]
             })
              

output:

                               hashtag_text   Type
0  [SEVENTEEN, SVT_POWER_OF_LOVE_THE_MOVIE]  photo
1                                        []  photo

CodePudding user response:

How about using the map function to create the two new columns you need:

df["hashtag_text"] = df["Hashtag_info"].map(lambda x: [i["text"] for i in x["hashtags"])
df["Type"] = df["Hashtag_info"].map(lambda x: x["media"][0]["type"])
  • Related