Home > Software engineering >  access the values of dict in pandas dataframe
access the values of dict in pandas dataframe

Time:12-07

I have a dataframe df like below with one column events-

|events|
|{'id': 109421132110384, 'created_at': datetime.datetime(2022, 11, 28, 11, 12, 50, tzinfo=tzutc()), 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False, 'spoiler_text': '', 'visibility': 'public', 'language': 'en', 'uri': 'https://users/statuses/10942113190455'}|
|{'id': 109421132340384, 'created_at': datetime.datetime(2022, 11, 30, 11, 12, 50, tzinfo=tzutc()), 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False, 'spoiler_text': '', 'visibility': 'public', 'language': 'en', 'uri': 'https://users/statuses/10942113190467'}|

I tried below appraoch -

a= df['events'][0]
print(a['id'])

Getting error : TypeError: string indices must be integers

Datatypes returned-

print(type(df['events'][0])) 
<class 'str'>
print(type(df['events']))
<class 'pandas.core.series.Series'>
print(type(df))
<class 'pandas.core.frame.DataFrame'>

I want to access id,created_at,in_reply_to_id in new columns of the same dataframe for each respective records.

Please help. Many thanks in advance.

CodePudding user response:

You can try this.

Access the first element thanks to the loc() method.

Then access to the event column and finally to the id key of the dict.

df.loc[0]["events"]["id"]  # 109421132110384

CodePudding user response:

You can use .iloc() to access the row and then specify the column name:

import datetime
from dateutil.tz import tzutc
import pandas as pd

df = pd.DataFrame({"events": [
    {'id': 109421132110384, 'created_at': datetime.datetime(2022, 11, 28, 11, 12, 50, tzinfo=tzutc()), 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False, 'spoiler_text': '', 'visibility': 'public', 'language': 'en', 'uri': 'https://users/statuses/10942113190455'},
    {'id': 109421132340384, 'created_at': datetime.datetime(2022, 11, 30, 11, 12, 50, tzinfo=tzutc()), 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False, 'spoiler_text': '', 'visibility': 'public', 'language': 'en', 'uri': 'https://users/statuses/10942113190467'}
]})

print(df.iloc[0]["events"])

Output:

{'id': 109421132110384,
 'created_at': datetime.datetime(2022, 11, 28, 11, 12, 50, tzinfo=tzutc()),
 'in_reply_to_id': None,
 'in_reply_to_account_id': None,
 'sensitive': False,
 'spoiler_text': '',
 'visibility': 'public',
 'language': 'en',
 'uri': 'https://users/statuses/10942113190455'}

CodePudding user response:

Your 'Event' column is a printed representation of a dictionary, not a dictionary.

An ugly solution would be to map eval to create a new column. Make sure to import datetime first.

df['parsed_events'] = df['events'].map(eval)

Once parsed you can treat the contents as a dictionary. Best would be to revisit how the original data was generated.

  • Related