I am stuck at below point not getting how do I get expected output
I have used pandas to read json data
code :
import pandas as pd
dt = pd.read_json('/content/sample_data/abc.json')
mycolumns = ['A','B']
dt[mycolumns ]
Below is the following data I have read it using pandas :
0 {'$date': '2020-12-02T12:32:57.781Z'} {'$date': '2020-12-02T13:32:57.781Z'}
1 {'$date': '2020-12-02T12:32:59.955Z'} {'$date': '2020-12-02T13:32:59.955Z'}
2 {'$date': '2020-12-04T12:33:13.686Z'} {'$date': '2020-12-04T14:33:13.686Z'}
3 {'$date': '2020-12-04T12:33:16.199Z'} {'$date': '2020-12-04T14:33:16.199Z'}
Now I need to do group by on date : YYY-MM-DD
and calculate difference of time in hours and do sum on it
Trying to get into below format so that I can do some difference of A - B
Note : I have given dummy difference just for reference not able to identify how to implement
DATE A B Difference
0 2020-12-02 '2020-12-02T12:32:57.781Z' '2020-12-02T13:32:57.781Z' 3
1 2020-12-02 '2020-12-02T12:32:59.955Z' '2020-12-02T13:32:59.955Z' 4
2 2020-12-04 '2020-12-02T13:33:13.686Z' '2020-12-04T14:33:13.686Z' 2
3 2020-12-04 '2020-12-02T13:33:16.199Z' '2020-12-04T14:33:13.686Z' 1
MY expected output
DATE Sum
0 2020-12-02 7
1 2020-12-04 3
I am providing data in csv format just use it
0,{'$date': '2020-12-02T12:32:57.781Z'},{'$date': '2020-12-02T13:32:57.781Z'}
1,{'$date': '2020-12-02T12:32:59.955Z'},{'$date': '2020-12-02T13:32:59.955Z'}
2,{'$date': '2020-12-04T12:33:13.686Z'},{'$date': '2020-12-04T14:33:13.686Z'}
3,{'$date': '2020-12-04T12:33:16.199Z'},{'$date': '2020-12-04T14:33:13.686Z'}
Please find data in dict format
{'__v': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'_id': {0: {'$oid': '5fc7897969b8ab14e1cad8f4'},
1: {'$oid': '5fc7897b69b8ab14e1cad8f5'},
2: {'$oid': '5fc7898969b8ab14e1cad8fa'},
3: {'$oid': '5fc7898c69b8ab14e1cad8fb'},
4: {'$oid': '5fca09bb45db2d39eee040b2'}},
'createdAt': {0: {'$date': '2020-12-02T12:32:57.781Z'},
1: {'$date': '2020-12-02T12:32:59.955Z'},
2: {'$date': '2020-12-02T12:33:13.686Z'},
3: {'$date': '2020-12-02T12:33:16.199Z'},
4: {'$date': '2020-12-04T10:04:43.662Z'}},
'dId': {0: {'$oid': '5fc7894ff7fa2b262ecb6c45'},
1: {'$oid': '5fc7894ff7fa2b262ecb6c45'},
2: {'$oid': '5fc7894ff7fa2b262ecb6c45'},
3: {'$oid': '5fc7894ff7fa2b262ecb6c45'},
4: {'$oid': '5fc7894ff7fa2b262ecb6c45'}},
'm': {0: '24:6F:28:55:7C:55',
1: '24:6F:28:55:7C:55',
2: '24:6F:28:55:7C:55',
3: '24:6F:28:55:7C:55',
4: '24:6F:28:55:7C:55'},
'pId': {0: {'$oid': '5fc7894ff7fa2b262ecb6c46'},
1: {'$oid': '5fc7894ff7fa2b262ecb6c46'},
2: {'$oid': '5fc7894ff7fa2b262ecb6c46'},
3: {'$oid': '5fc7894ff7fa2b262ecb6c46'},
4: {'$oid': '5fc9f66cf7fa2b262ecb6c5a'}},
'pVal': {0: {'__v': 0,
'_id': '5fc7894ff7fa2b262ecb6c46',
'cat': 's',
'createdAt': '2020-12-02T12:32:15.237Z',
'device': {'__v': 0,
'_id': '5fc7894ff7fa2b262ecb6c45',
'assignedTo': '5fc62223f7fa2b262ecb6c2d',
'createdAt': '2020-12-02T12:32:15.235Z',
'dType': '5ee38d3a4afdf74cf8b07dfb',
'deviceId': 'LTAC21220050009',
'macId': '24:6F:28:55:7C:55',
'ports': ['5fc7894ff7fa2b262ecb6c46'],
'updatedAt': '2020-12-02T12:32:36.773Z'},
'mac': '24:6F:28:55:7C:55',
'no': '0',
'portIcon': 'AC',
'portType': 'AC',
'status': None,
'title': 'Switch',
'updatedAt': '2020-12-02T12:32:57.776Z',
'val': 1},
1: {'__v': 0,
'_id': '5fc7894ff7fa2b262ecb6c46',
'cat': 's',
'createdAt': '2020-12-02T12:32:15.237Z',
'device': {'__v': 0,
'_id': '5fc7894ff7fa2b262ecb6c45',
'assignedTo': '5fc62223f7fa2b262ecb6c2d',
'createdAt': '2020-12-02T12:32:15.235Z',
'dType': '5ee38d3a4afdf74cf8b07dfb',
'deviceId': 'LTAC21220050009',
'macId': '24:6F:28:55:7C:55',
'ports': ['5fc7894ff7fa2b262ecb6c46'],
'updatedAt': '2020-12-02T12:32:36.773Z'},
'mac': '24:6F:28:55:7C:55',
'no': '0',
'portIcon': 'AC',
'portType': 'AC',
'status': None,
'title': 'Switch',
'updatedAt': '2020-12-02T12:32:59.951Z',
'val': 0},
2: {'__v': 0,
'_id': '5fc7894ff7fa2b262ecb6c46',
'cat': 's',
'createdAt': '2020-12-02T12:32:15.237Z',
'device': {'__v': 0,
'_id': '5fc7894ff7fa2b262ecb6c45',
'assignedTo': '5fc62223f7fa2b262ecb6c2d',
'createdAt': '2020-12-02T12:32:15.235Z',
'dType': '5ee38d3a4afdf74cf8b07dfb',
'deviceId': 'LTAC21220050009',
'macId': '24:6F:28:55:7C:55',
'ports': ['5fc7894ff7fa2b262ecb6c46'],
'updatedAt': '2020-12-02T12:32:36.773Z'},
'mac': '24:6F:28:55:7C:55',
'no': '0',
'portIcon': 'AC',
'portType': 'AC',
'status': None,
'title': 'Switch',
'updatedAt': '2020-12-02T12:33:13.682Z',
'val': 1},
3: {'__v': 0,
'_id': '5fc7894ff7fa2b262ecb6c46',
'cat': 's',
'createdAt': '2020-12-02T12:32:15.237Z',
'device': {'__v': 0,
'_id': '5fc7894ff7fa2b262ecb6c45',
'assignedTo': '5fc62223f7fa2b262ecb6c2d',
'createdAt': '2020-12-02T12:32:15.235Z',
'dType': '5ee38d3a4afdf74cf8b07dfb',
'deviceId': 'LTAC21220050009',
'macId': '24:6F:28:55:7C:55',
'ports': ['5fc7894ff7fa2b262ecb6c46'],
'updatedAt': '2020-12-02T12:32:36.773Z'},
'mac': '24:6F:28:55:7C:55',
'no': '0',
'portIcon': 'AC',
'portType': 'AC',
'status': None,
'title': 'Switch',
'updatedAt': '2020-12-02T12:33:16.195Z',
'val': 0},
4: {'__v': 0,
'_id': '5fc9f66cf7fa2b262ecb6c5a',
'cat': 's',
'createdAt': '2020-12-04T08:42:20.724Z',
'device': {'__v': 2,
'_id': '5fc7894ff7fa2b262ecb6c45',
'assignedTo': '5fc62223f7fa2b262ecb6c2d',
'configured': True,
'createdAt': '2020-12-02T12:32:15.235Z',
'dType': '5ee38d3a4afdf74cf8b07dfb',
'deviceId': 'LTSAC21220050009',
'lastConfiguredAt': '2020-12-04T10:04:15.000Z',
'lp': '2020-12-04T10:04:31.482Z',
'macId': '24:6F:28:55:7C:55',
'ns': '-69',
'online': True,
'ports': ['5fc9f66cf7fa2b262ecb6c5a'],
'updatedAt': '2020-12-04T10:04:31.483Z',
'version': 'AC_1.5'},
'mac': '24:6F:28:55:7C:55',
'no': '0',
'portIcon': 'AC',
'portType': 'AC',
'status': None,
'title': 'Switch',
'updatedAt': '2020-12-04T10:04:43.657Z',
'val': 1}},
't': {0: 'o', 1: 'o', 2: 'o', 3: 'o', 4: 'o'},
'updatedAt': {0: {'$date': '2020-12-02T12:32:57.781Z'},
1: {'$date': '2020-12-02T12:32:59.955Z'},
2: {'$date': '2020-12-02T12:33:13.686Z'},
3: {'$date': '2020-12-02T12:33:16.199Z'},
4: {'$date': '2020-12-04T10:04:43.662Z'}}}
CodePudding user response:
If I understand correctly, you want the difference in hours between the "updatedAt" and "createdAt" columns:
df = pd.json_normalize(dt["pVal"])
df["createdAt"] = pd.to_datetime(df["createdAt"])
df["updatedAt"] = pd.to_datetime(df["updatedAt"])
df["Difference"] = df["updatedAt"].sub(df["createdAt"]).dt.total_seconds().div(3600)
output = df.groupby(df["createdAt"].dt.date)["Difference"].sum().reset_index()
>>> output
createdAt Difference
0 2020-12-02 0.057404
1 2020-12-04 1.373037
CodePudding user response:
I have created a dataframe based on your little example that you provided.
import pandas as pd
# assign data of lists.
data = {'DATE': ['2020-12-02', '2020-12-02', '2020-12-04', '2020-12-04'], 'A': ['2020-12-02T12:32:57.781Z', '2020-12-02T12:32:59.955Z', '2020-12-02T13:33:13.686Z', '2020-12-02T13:33:16.199Z'], 'B':['2020-12-02T13:32:57.781Z', '2020-12-02T13:32:59.955Z', '2020-12-04T14:33:13.686Z', '2020-12-04T14:33:13.686Z'], 'Difference':[3,4,2,1]}
# Create DataFrame
df = pd.DataFrame(data)
# Convert to datetime
df['DATE'] = pd.to_datetime(df ['DATE'])
Get the result:
df_final = df.groupby("DATE", as_index = False)["Difference"].sum()
df_final