Home > Software design >  How to group by and calculate sum in pandas
How to group by and calculate sum in pandas

Time:04-08

I am stuck at below point not getting how do I get expected output

I have used pandas to read json data

code :

import pandas as pd

dt = pd.read_json('/content/sample_data/abc.json')
mycolumns = ['A','B']
dt[mycolumns ]

Below is the following data I have read it using pandas :

0   {'$date': '2020-12-02T12:32:57.781Z'}   {'$date': '2020-12-02T13:32:57.781Z'}
1   {'$date': '2020-12-02T12:32:59.955Z'}   {'$date': '2020-12-02T13:32:59.955Z'}
2   {'$date': '2020-12-04T12:33:13.686Z'}   {'$date': '2020-12-04T14:33:13.686Z'}
3   {'$date': '2020-12-04T12:33:16.199Z'}   {'$date': '2020-12-04T14:33:16.199Z'}

Now I need to do group by on date : YYY-MM-DD and calculate difference of time in hours and do sum on it

Trying to get into below format so that I can do some difference of A - B

Note : I have given dummy difference just for reference not able to identify how to implement

    DATE         A                                     B                 Difference         
0   2020-12-02  '2020-12-02T12:32:57.781Z'  '2020-12-02T13:32:57.781Z'      3              
1   2020-12-02  '2020-12-02T12:32:59.955Z'  '2020-12-02T13:32:59.955Z'      4    
2   2020-12-04  '2020-12-02T13:33:13.686Z'  '2020-12-04T14:33:13.686Z'      2
3   2020-12-04  '2020-12-02T13:33:16.199Z'  '2020-12-04T14:33:13.686Z'      1 

MY expected output

    DATE          Sum        
0   2020-12-02     7              
1   2020-12-04     3  

I am providing data in csv format just use it

0,{'$date': '2020-12-02T12:32:57.781Z'},{'$date': '2020-12-02T13:32:57.781Z'}
1,{'$date': '2020-12-02T12:32:59.955Z'},{'$date': '2020-12-02T13:32:59.955Z'}
2,{'$date': '2020-12-04T12:33:13.686Z'},{'$date': '2020-12-04T14:33:13.686Z'}
3,{'$date': '2020-12-04T12:33:16.199Z'},{'$date': '2020-12-04T14:33:13.686Z'}

Please find data in dict format

{'__v': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
 '_id': {0: {'$oid': '5fc7897969b8ab14e1cad8f4'},
  1: {'$oid': '5fc7897b69b8ab14e1cad8f5'},
  2: {'$oid': '5fc7898969b8ab14e1cad8fa'},
  3: {'$oid': '5fc7898c69b8ab14e1cad8fb'},
  4: {'$oid': '5fca09bb45db2d39eee040b2'}},
 'createdAt': {0: {'$date': '2020-12-02T12:32:57.781Z'},
  1: {'$date': '2020-12-02T12:32:59.955Z'},
  2: {'$date': '2020-12-02T12:33:13.686Z'},
  3: {'$date': '2020-12-02T12:33:16.199Z'},
  4: {'$date': '2020-12-04T10:04:43.662Z'}},
 'dId': {0: {'$oid': '5fc7894ff7fa2b262ecb6c45'},
  1: {'$oid': '5fc7894ff7fa2b262ecb6c45'},
  2: {'$oid': '5fc7894ff7fa2b262ecb6c45'},
  3: {'$oid': '5fc7894ff7fa2b262ecb6c45'},
  4: {'$oid': '5fc7894ff7fa2b262ecb6c45'}},
 'm': {0: '24:6F:28:55:7C:55',
  1: '24:6F:28:55:7C:55',
  2: '24:6F:28:55:7C:55',
  3: '24:6F:28:55:7C:55',
  4: '24:6F:28:55:7C:55'},
 'pId': {0: {'$oid': '5fc7894ff7fa2b262ecb6c46'},
  1: {'$oid': '5fc7894ff7fa2b262ecb6c46'},
  2: {'$oid': '5fc7894ff7fa2b262ecb6c46'},
  3: {'$oid': '5fc7894ff7fa2b262ecb6c46'},
  4: {'$oid': '5fc9f66cf7fa2b262ecb6c5a'}},
 'pVal': {0: {'__v': 0,
   '_id': '5fc7894ff7fa2b262ecb6c46',
   'cat': 's',
   'createdAt': '2020-12-02T12:32:15.237Z',
   'device': {'__v': 0,
    '_id': '5fc7894ff7fa2b262ecb6c45',
    'assignedTo': '5fc62223f7fa2b262ecb6c2d',
    'createdAt': '2020-12-02T12:32:15.235Z',
    'dType': '5ee38d3a4afdf74cf8b07dfb',
    'deviceId': 'LTAC21220050009',
    'macId': '24:6F:28:55:7C:55',
    'ports': ['5fc7894ff7fa2b262ecb6c46'],
    'updatedAt': '2020-12-02T12:32:36.773Z'},
   'mac': '24:6F:28:55:7C:55',
   'no': '0',
   'portIcon': 'AC',
   'portType': 'AC',
   'status': None,
   'title': 'Switch',
   'updatedAt': '2020-12-02T12:32:57.776Z',
   'val': 1},
  1: {'__v': 0,
   '_id': '5fc7894ff7fa2b262ecb6c46',
   'cat': 's',
   'createdAt': '2020-12-02T12:32:15.237Z',
   'device': {'__v': 0,
    '_id': '5fc7894ff7fa2b262ecb6c45',
    'assignedTo': '5fc62223f7fa2b262ecb6c2d',
    'createdAt': '2020-12-02T12:32:15.235Z',
    'dType': '5ee38d3a4afdf74cf8b07dfb',
    'deviceId': 'LTAC21220050009',
    'macId': '24:6F:28:55:7C:55',
    'ports': ['5fc7894ff7fa2b262ecb6c46'],
    'updatedAt': '2020-12-02T12:32:36.773Z'},
   'mac': '24:6F:28:55:7C:55',
   'no': '0',
   'portIcon': 'AC',
   'portType': 'AC',
   'status': None,
   'title': 'Switch',
   'updatedAt': '2020-12-02T12:32:59.951Z',
   'val': 0},
  2: {'__v': 0,
   '_id': '5fc7894ff7fa2b262ecb6c46',
   'cat': 's',
   'createdAt': '2020-12-02T12:32:15.237Z',
   'device': {'__v': 0,
    '_id': '5fc7894ff7fa2b262ecb6c45',
    'assignedTo': '5fc62223f7fa2b262ecb6c2d',
    'createdAt': '2020-12-02T12:32:15.235Z',
    'dType': '5ee38d3a4afdf74cf8b07dfb',
    'deviceId': 'LTAC21220050009',
    'macId': '24:6F:28:55:7C:55',
    'ports': ['5fc7894ff7fa2b262ecb6c46'],
    'updatedAt': '2020-12-02T12:32:36.773Z'},
   'mac': '24:6F:28:55:7C:55',
   'no': '0',
   'portIcon': 'AC',
   'portType': 'AC',
   'status': None,
   'title': 'Switch',
   'updatedAt': '2020-12-02T12:33:13.682Z',
   'val': 1},
  3: {'__v': 0,
   '_id': '5fc7894ff7fa2b262ecb6c46',
   'cat': 's',
   'createdAt': '2020-12-02T12:32:15.237Z',
   'device': {'__v': 0,
    '_id': '5fc7894ff7fa2b262ecb6c45',
    'assignedTo': '5fc62223f7fa2b262ecb6c2d',
    'createdAt': '2020-12-02T12:32:15.235Z',
    'dType': '5ee38d3a4afdf74cf8b07dfb',
    'deviceId': 'LTAC21220050009',
    'macId': '24:6F:28:55:7C:55',
    'ports': ['5fc7894ff7fa2b262ecb6c46'],
    'updatedAt': '2020-12-02T12:32:36.773Z'},
   'mac': '24:6F:28:55:7C:55',
   'no': '0',
   'portIcon': 'AC',
   'portType': 'AC',
   'status': None,
   'title': 'Switch',
   'updatedAt': '2020-12-02T12:33:16.195Z',
   'val': 0},
  4: {'__v': 0,
   '_id': '5fc9f66cf7fa2b262ecb6c5a',
   'cat': 's',
   'createdAt': '2020-12-04T08:42:20.724Z',
   'device': {'__v': 2,
    '_id': '5fc7894ff7fa2b262ecb6c45',
    'assignedTo': '5fc62223f7fa2b262ecb6c2d',
    'configured': True,
    'createdAt': '2020-12-02T12:32:15.235Z',
    'dType': '5ee38d3a4afdf74cf8b07dfb',
    'deviceId': 'LTSAC21220050009',
    'lastConfiguredAt': '2020-12-04T10:04:15.000Z',
    'lp': '2020-12-04T10:04:31.482Z',
    'macId': '24:6F:28:55:7C:55',
    'ns': '-69',
    'online': True,
    'ports': ['5fc9f66cf7fa2b262ecb6c5a'],
    'updatedAt': '2020-12-04T10:04:31.483Z',
    'version': 'AC_1.5'},
   'mac': '24:6F:28:55:7C:55',
   'no': '0',
   'portIcon': 'AC',
   'portType': 'AC',
   'status': None,
   'title': 'Switch',
   'updatedAt': '2020-12-04T10:04:43.657Z',
   'val': 1}},
 't': {0: 'o', 1: 'o', 2: 'o', 3: 'o', 4: 'o'},
 'updatedAt': {0: {'$date': '2020-12-02T12:32:57.781Z'},
  1: {'$date': '2020-12-02T12:32:59.955Z'},
  2: {'$date': '2020-12-02T12:33:13.686Z'},
  3: {'$date': '2020-12-02T12:33:16.199Z'},
  4: {'$date': '2020-12-04T10:04:43.662Z'}}}

CodePudding user response:

If I understand correctly, you want the difference in hours between the "updatedAt" and "createdAt" columns:

df = pd.json_normalize(dt["pVal"])
df["createdAt"] = pd.to_datetime(df["createdAt"])
df["updatedAt"] = pd.to_datetime(df["updatedAt"])
df["Difference"] = df["updatedAt"].sub(df["createdAt"]).dt.total_seconds().div(3600)
output = df.groupby(df["createdAt"].dt.date)["Difference"].sum().reset_index()

>>> output
    createdAt  Difference
0  2020-12-02    0.057404
1  2020-12-04    1.373037

CodePudding user response:

I have created a dataframe based on your little example that you provided.

import pandas as pd  
  
# assign data of lists.  
data = {'DATE': ['2020-12-02', '2020-12-02', '2020-12-04', '2020-12-04'], 'A': ['2020-12-02T12:32:57.781Z', '2020-12-02T12:32:59.955Z', '2020-12-02T13:33:13.686Z', '2020-12-02T13:33:16.199Z'], 'B':['2020-12-02T13:32:57.781Z', '2020-12-02T13:32:59.955Z', '2020-12-04T14:33:13.686Z', '2020-12-04T14:33:13.686Z'], 'Difference':[3,4,2,1]}  
  
# Create DataFrame  
df = pd.DataFrame(data) 

# Convert to datetime
df['DATE'] = pd.to_datetime(df ['DATE'])

Get the result:

df_final = df.groupby("DATE", as_index = False)["Difference"].sum()
df_final

enter image description here

  • Related