Pandas Create dict from Dataframe-CodePudding

How Can I convert this DF from pandas

Into this dictionary:

{"114811":{
  "password-manager-active":[
   {
     "ref_date":"2021-10-04",
     "value":"0",
     "1_months":"0",
     "2_months":"0",
     "6_months":"0",
     "12_months":"0",
     "24_months":"3",
     "60_months":"3"
  },
  {
     "ref_date":"2021-10-04",
     "value":"1",
     "1_months":"0",
     "2_months":"0",
     "6_months":"0",
     "12_months":"0",
     "24_months":"0",
     "60_months":"0"
    }
  ],
  "password-manager-installed":[
  {
     "ref_date":"2021-10-04",
     "value":"0",
     "1_months":"0",
     "2_months":"0",
     "6_months":"0",
     "12_months":"0",
     "24_months":"0",
     "60_months":"0"
  },
  {
     "ref_date":"2021-10-04",
     "value":"1",
     "1_months":"0",
     "2_months":"0",
     "6_months":"0",
     "12_months":"0",
     "24_months":"1",
     "60_months":"1"
   }
  ],
   "password-manager-last-used":[
  {
     "ref_date":"2021-10-04",
     "value":"1",
     "1_months":"0",
     "2_months":"0",
     "6_months":"0",
     "12_months":"0",
     "24_months":"0",
     "60_months":"0"
     }
   ],
   "password-manager-multifactor":[
  {
     "ref_date":"2021-10-04",
     "value":"0",
     "1_months":"0",
     "2_months":"0",
     "6_months":"0",
     "12_months":"0",
     "24_months":"0",
     "60_months":"0"
  },
  {
     "ref_date":"2021-10-04",
     "value":"1",
     "1_months":"0",
     "2_months":"0",
     "6_months":"0",
     "12_months":"0",
     "24_months":"0",
     "60_months":"0"
    }
  ],
   "password-manager-reset-required":[
   {
     "ref_date":"2021-10-04",
     "value":"0",
     "1_months":"0",
     "2_months":"0",
     "6_months":"0",
     "12_months":"0",
     "24_months":"0",
     "60_months":"0"
   },
   {
     "ref_date":"2021-10-04",
     "value":"1",
     "1_months":"0",
     "2_months":"0",
     "6_months":"0",
     "12_months":"0",
     "24_months":"0",
     "60_months":"0"
        }
      ]
    }}

I mean, I need to group by person_nid and behavior_type and then create a list of dicts for the records to be grouped where person_nid is a key with another dictionary inside. In the nested dictionary, the keys are the behavior_type and then the value for every behavior_type is a list of dicts describing each record.

CodePudding user response：

In order to obtain what you are asking, you can filter your DataFrame for each unique user id, and on the filtered Dataframe obtain every row for each behavior_type. You can achieve this with the following code:

   import pandas as pd

   # read your dataframe inside the df variable

   final_dict = dict()
   not_nested= ["person_nid", "behaviour_type"]
   final_cols= list()

   # obtain the inner columns
   for col in  df.columns:
        if col not in not_nested:
                final_cols.append(col)

   # iterate over all the person_nid
   for nid in df["person_nid"].unique():

        # dataframe containing only the rows with the current person_nid
        nid_df = df[df["person_nid"]==nid]
        # create nested dict for current person_nid
        final_dict[nid] = dict()

        # get all the behaviour_type for the current nid
        for btype in nid_df["behaviour_type"].unique():

                # create list of dictionaries for the current behaviour_type
                final_dict[nid][btype]=list()
                # dataframe with the rows for the inner dictionaries
                b_df = nid_df[nid_df["behaviour_type"]==btype][final_cols]
                # create all the dictionaries inside the list
                for index, row in b_df.iterrows():
                        new_dict = dict()
                        for col in b_df.columns:
                             new_dict[col]=row[col]
                        final_dict[nid][btype].append(new_dict)

If you don't want to create the final dictionaries, you could store a list of the rows returned by b_df.iterrows() since they can be accessed in the same way of a dictionary with the instruction rows[colname]. If you want to avoid iterations, you can use groupby and apply like in the following code to produce the same result:

not_nested= ["person_nid", "behaviour_type"]
final_cols= list()
for col in  df.columns:
   if col not in not_nested:
      final_cols.append(col)

def external_apply(df):
    df= df.groupby("behaviour_type").apply(inner_apply)
    return df.to_dict()
def inner_apply(df):
    return df[final_cols].to_dict(orient="records")

final_dict =df.groupby("person_nid").apply(external_apply).to_dict()

CodePudding user response：

Something like multiple groups then using to_dict('records') could work.

result = {}
for g,g_hold in df.groupby('person_nid'):
    for g2,g2_hold in g_hold[[h for h in g_hold if h != 'person_nid']].groupby('behaviour_type'):
        if g in result:
            result[g][g2] = g2_hold[[h for h in g2_hold if h != 'behaviour_type']].to_dict('records')
        else:
            result[g] = {g2:g2_hold[[h for h in g2_hold if h != 'type']].to_dict('records')}