So, While learning Sentiment analysis, I got a column in my dataframe which include data looks like this
Index | Hashtag_info |
---|---|
0 | {'hashtags': [{'text': 'SEVENTEEN', 'indices': [139, 149]}, {'text': 'SVT_POWER_OF_LOVE_THE_MOVIE', 'indices': [150, 178]}], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'url_text', 'expanded_url': 'url_text', 'display_url': 'url_text', 'indices': [114, 137]}], 'media': [{'id': 1505695832837804032, 'id_str': '1505695832837804032', 'indices': [179, 202], 'media_url': 'url_text', 'media_url_https': 'url_text', 'url': 'url_text', 'display_url': 'url_text', 'expanded_url': 'url_text', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 1920, 'h': 1080, 'resize': 'fit'}, 'medium': {'w': 1200, 'h': 675, 'resize': 'fit'}, 'small': {'w': 680, 'h': 383, 'resize': 'fit'}}}]} |
1 | {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 1505486045957300230, 'id_str': '1505486045957300230', 'indices': [264, 287], 'media_url': 'url_text', 'media_url_https': 'url_text', 'url': 'url_text', 'display_url': 'url_text', 'expanded_url': 'url_text', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 467, 'h': 680, 'resize': 'fit'}, 'large': {'w': 1408, 'h': 2048, 'resize': 'fit'}, 'medium': {'w': 825, 'h': 1200, 'resize': 'fit'}}}]} |
and so on for nearly 20k row
So my question how to convert this data type of object O into something like this
Index | hashtag_text | Type |
---|---|---|
0 | ['SEVENTEEN','SVT_POWER_OF_LOVE_THE_MOVIE'] | photo |
1 | [] | video |
Like extract text and type only
CodePudding user response:
If the underlying data has no flaw, you could use loops:
pd.DataFrame({'hashtag_text': [[e['text'] for e in x['hashtags']]
for x in df['Hashtag_info']],
'Type': [x['media'][0]['type'] for x in df['Hashtag_info']]
})
output:
hashtag_text Type
0 [SEVENTEEN, SVT_POWER_OF_LOVE_THE_MOVIE] photo
1 [] photo
CodePudding user response:
How about using the map function to create the two new columns you need:
df["hashtag_text"] = df["Hashtag_info"].map(lambda x: [i["text"] for i in x["hashtags"])
df["Type"] = df["Hashtag_info"].map(lambda x: x["media"][0]["type"])