Home > Blockchain >  Pandas Create dict from Dataframe
Pandas Create dict from Dataframe


How Can I convert this DF from pandas How Can  I convert

Into this dictionary:


I mean, I need to group by person_nid and behavior_type and then create a list of dicts for the records to be grouped where person_nid is a key with another dictionary inside. In the nested dictionary, the keys are the behavior_type and then the value for every behavior_type is a list of dicts describing each record.

CodePudding user response:

In order to obtain what you are asking, you can filter your DataFrame for each unique user id, and on the filtered Dataframe obtain every row for each behavior_type. You can achieve this with the following code:

   import pandas as pd

   # read your dataframe inside the df variable

   final_dict = dict()
   not_nested= ["person_nid", "behaviour_type"]
   final_cols= list()

   # obtain the inner columns
   for col in  df.columns:
        if col not in not_nested:

   # iterate over all the person_nid
   for nid in df["person_nid"].unique():

        # dataframe containing only the rows with the current person_nid
        nid_df = df[df["person_nid"]==nid]
        # create nested dict for current person_nid
        final_dict[nid] = dict()

        # get all the behaviour_type for the current nid
        for btype in nid_df["behaviour_type"].unique():

                # create list of dictionaries for the current behaviour_type
                # dataframe with the rows for the inner dictionaries
                b_df = nid_df[nid_df["behaviour_type"]==btype][final_cols]
                # create all the dictionaries inside the list
                for index, row in b_df.iterrows():
                        new_dict = dict()
                        for col in b_df.columns:

If you don't want to create the final dictionaries, you could store a list of the rows returned by b_df.iterrows() since they can be accessed in the same way of a dictionary with the instruction rows[colname]. If you want to avoid iterations, you can use groupby and apply like in the following code to produce the same result:

not_nested= ["person_nid", "behaviour_type"]
final_cols= list()
for col in  df.columns:
   if col not in not_nested:

def external_apply(df):
    df= df.groupby("behaviour_type").apply(inner_apply)
    return df.to_dict()
def inner_apply(df):
    return df[final_cols].to_dict(orient="records")

final_dict =df.groupby("person_nid").apply(external_apply).to_dict()

CodePudding user response:

Something like multiple groups then using to_dict('records') could work.

result = {}
for g,g_hold in df.groupby('person_nid'):
    for g2,g2_hold in g_hold[[h for h in g_hold if h != 'person_nid']].groupby('behaviour_type'):
        if g in result:
            result[g][g2] = g2_hold[[h for h in g2_hold if h != 'behaviour_type']].to_dict('records')
            result[g] = {g2:g2_hold[[h for h in g2_hold if h != 'type']].to_dict('records')}
  • Related