I have a dataframe df
like below with one column events-
|events|
|{'id': 109421132110384, 'created_at': datetime.datetime(2022, 11, 28, 11, 12, 50, tzinfo=tzutc()), 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False, 'spoiler_text': '', 'visibility': 'public', 'language': 'en', 'uri': 'https://users/statuses/10942113190455'}|
|{'id': 109421132340384, 'created_at': datetime.datetime(2022, 11, 30, 11, 12, 50, tzinfo=tzutc()), 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False, 'spoiler_text': '', 'visibility': 'public', 'language': 'en', 'uri': 'https://users/statuses/10942113190467'}|
I tried below appraoch -
a= df['events'][0]
print(a['id'])
Getting error : TypeError: string indices must be integers
Datatypes returned-
print(type(df['events'][0]))
<class 'str'>
print(type(df['events']))
<class 'pandas.core.series.Series'>
print(type(df))
<class 'pandas.core.frame.DataFrame'>
I want to access id,created_at,in_reply_to_id in new columns of the same dataframe for each respective records.
Please help. Many thanks in advance.
CodePudding user response:
You can try this.
Access the first element thanks to the loc()
method.
Then access to the event
column and finally to the id
key of the dict.
df.loc[0]["events"]["id"] # 109421132110384
CodePudding user response:
You can use .iloc()
to access the row and then specify the column name:
import datetime
from dateutil.tz import tzutc
import pandas as pd
df = pd.DataFrame({"events": [
{'id': 109421132110384, 'created_at': datetime.datetime(2022, 11, 28, 11, 12, 50, tzinfo=tzutc()), 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False, 'spoiler_text': '', 'visibility': 'public', 'language': 'en', 'uri': 'https://users/statuses/10942113190455'},
{'id': 109421132340384, 'created_at': datetime.datetime(2022, 11, 30, 11, 12, 50, tzinfo=tzutc()), 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False, 'spoiler_text': '', 'visibility': 'public', 'language': 'en', 'uri': 'https://users/statuses/10942113190467'}
]})
print(df.iloc[0]["events"])
Output:
{'id': 109421132110384,
'created_at': datetime.datetime(2022, 11, 28, 11, 12, 50, tzinfo=tzutc()),
'in_reply_to_id': None,
'in_reply_to_account_id': None,
'sensitive': False,
'spoiler_text': '',
'visibility': 'public',
'language': 'en',
'uri': 'https://users/statuses/10942113190455'}
CodePudding user response:
Your 'Event' column is a printed representation of a dictionary, not a dictionary.
An ugly solution would be to map eval to create a new column. Make sure to import datetime first.
df['parsed_events'] = df['events'].map(eval)
Once parsed you can treat the contents as a dictionary. Best would be to revisit how the original data was generated.