Home > Back-end >  How to do custom window function on JSON object with pandas?
How to do custom window function on JSON object with pandas?

Time:05-09

I have a rather nested JSON object below, and I am trying to calculate the user (ie 'profileId') with the most events (ie length of 'parameters' key.

I have the code below to get the length of the parameter, but I am trying to now have that calculation be correct for each record, as they way I have it set now would set it the same value for each record - I looked into pandas window functions https://pandas.pydata.org/docs/user_guide/window.html but am having trouble getting to the correct outcome.

    response = response.json()
    df = pd.json_normalize(response['items'])
    df['calcfield'] = len(df["events"].iloc[0][0].get('parameters'))
    

the output of df['arrayfield'] is below:

[
        {
           "type":"auth",
           "name":"activity",
           "parameters":[
              {
                 "name":"api_name",
                 "value":"admin"
              },
              {
                 "name":"method_name",
                 "value":"directory.users.list"
              },
              {
                 "name":"client_id",
                 "value":"722230783769-dsta4bi9fkom72qcu0t34aj3qpcoqloq.apps.googleusercontent.com"
              },
              {
                 "name":"num_response_bytes",
                 "intValue":"7158"
              },
              {
                 "name":"product_bucket",
                 "value":"GSUITE_ADMIN"
              },
              {
                 "name":"app_name",
                 "value":"Untitled project"
              },
              {
                 "name":"client_type",
                 "value":"WEB"
              }
           ]
        }
     ]   },   {
     "kind":"admin#reports#activity",
     "id":{
        "time":"2022-05-05T23:58:48.914Z",
        "uniqueQualifier":"-4002873813067783265",
        "applicationName":"token",
        "customerId":"C02f6wppb"
     },
     "etag":"\"5T53xK7dpLei95RNoKZd9uz5Xb8LJpBJb72fi2HaNYM/9DTdB8t7uixvUbjo4LUEg53_gf0\"",
     "actor":{
        "email":"[email protected]",
        "profileId":"100230688039070881323"
     },
     "ipAddress":"54.80.168.30",
     "events":[
        {
           "type":"auth",
           "name":"activity",
           "parameters":[
              {
                 "name":"api_name",
                 "value":"gmail"
              },
              {
                 "name":"method_name",
                 "value":"gmail.users.messages.list"
              },
              {
                 "name":"client_id",
                 "value":"927538837578.apps.googleusercontent.com"
              },
              {
                 "name":"num_response_bytes",
                 "intValue":"2"
              },
              {
                 "name":"product_bucket",
                 "value":"GMAIL"
              },
              {
                 "name":"app_name",
                 "value":"Zapier"
              },
              {
                 "name":"client_type",
                 "value":"WEB"
              }
           ]

ORIGINAL JSON BLOB I READ IN

{
   "kind":"admin#reports#activities",
   "etag":"\"5g8\"",
   "nextPageToken":"A:1651795128914034:-4002873813067783265:151219070090:C02f6wppb",
   "items":[
      {
         "kind":"admin#reports#activity",
         "id":{
            "time":"2022-05-05T23:59:39.421Z",
            "uniqueQualifier":"5526793068617678141",
            "applicationName":"token",
            "customerId":"cds"
         },
         "etag":"\"jkYcURYoi8\"",
         "actor":{
            "email":"[email protected]",
            "profileId":"1323"
         },
         "ipAddress":"107.178.193.87",
         "events":[
            {
               "type":"auth",
               "name":"activity",
               "parameters":[
                  {
                     "name":"api_name",
                     "value":"admin"
                  },
                  {
                     "name":"method_name",
                     "value":"directory.users.list"
                  },
                  {
                     "name":"client_id",
                     "value":"722230783769-dsta4bi9fkom72qcu0t34aj3qpcoqloq.apps.googleusercontent.com"
                  },
                  {
                     "name":"num_response_bytes",
                     "intValue":"7158"
                  },
                  {
                     "name":"product_bucket",
                     "value":"GSUITE_ADMIN"
                  },
                  {
                     "name":"app_name",
                     "value":"Untitled project"
                  },
                  {
                     "name":"client_type",
                     "value":"WEB"
                  }
               ]
            }
         ]
      },
      {
         "kind":"admin#reports#activity",
         "id":{
            "time":"2022-05-05T23:58:48.914Z",
            "uniqueQualifier":"-4002873813067783265",
            "applicationName":"token",
            "customerId":"df"
         },
         "etag":"\"5T53xK7dpLei95RNoKZd9uz5Xb8LJpBJb72fi2HaNYM/9DTdB8t7uixvUbjo4LUEg53_gf0\"",
         "actor":{
            "email":"[email protected]",
            "profileId":"1324"
         },
         "ipAddress":"54.80.168.30",
         "events":[
            {
               "type":"auth",
               "name":"activity",
               "parameters":[
                  {
                     "name":"api_name",
                     "value":"gmail"
                  },
                  {
                     "name":"method_name",
                     "value":"gmail.users.messages.list"
                  },
                  {
                     "name":"client_id",
                     "value":"927538837578.apps.googleusercontent.com"
                  },
                  {
                     "name":"num_response_bytes",
                     "intValue":"2"
                  },
                  {
                     "name":"product_bucket",
                     "value":"GMAIL"
                  },
                  
                  {
                     "name":"client_type",
                     "value":"WEB"
                  }
               ]
            }
         ]
      }
   ]
}

CodePudding user response:

It's not entirely clear what you asking and df['arrayfield'] isn't in your example provided. However, if you look at the events column after json_normalize, you can use the following line to pull out the length of each parameters key. The blob you gave as an example was set to response...

df = pd.json_normalize(response['items'])
df['calcfield'] = df['events'].str[0].str.get('parameters').str.len()

Becauase each parameters key has 7 elements, it's tough to say this is what you really want.

CodePudding user response:

Use:

df.groupby('actor.profileId')['events'].apply(lambda x: [len(x.iloc[i][0]['parameters']) for i in range(len(x))])

which returns the list of each profileid count of parameters. Output and the sample data:

actor.profileId
1323    [7]
1324    [7]
Name: events, dtype: object
  • Related