Home > other >  Datetime processing from CSV file
Datetime processing from CSV file

Time:11-19

I have CSV data like this:

import io
import pandas as pd

s = """
Name,Earliest,Latest,Data
G.E,2018-04-21T02:18:59.830000Z,2019-01-28T06:54:28.660000Z,3
G.N,2018-04-21T02:18:59.830000Z,2019-01-28T06:54:28.660000Z,3
G.Z,2018-04-21T02:18:59.830000Z,2019-01-28T06:54:28.660000Z,3
G.E,2018-04-21T03:50:59.880000Z,2019-01-01T16:05:34.070000Z,1
G.N,2018-04-21T03:50:59.880000Z,2019-01-01T16:05:34.070000Z,1
G.Z,2018-04-21T03:50:59.880000Z,2019-01-01T16:05:34.070000Z,1
G.E,2018-04-21T05:34:00.000000Z,2018-08-25T15:33:26.380000Z,2
G.N,2018-04-21T05:34:00.000000Z,2018-08-25T15:33:26.380000Z,2
G.Z,2018-04-21T05:34:00.000000Z,2018-08-25T15:33:26.380000Z,2
G.E,2018-04-21T08:51:00.000000Z,2019-03-31T07:52:36.960000Z,2
G.N,2018-04-21T08:51:00.000000Z,2019-03-31T07:52:36.960000Z,2
G.Z,2018-04-21T08:51:00.000000Z,2019-03-31T07:52:36.960000Z,2
G.E,2018-04-21T09:27:00.050000Z,2019-02-20T21:33:56.470000Z,1
G.N,2018-04-21T09:27:00.050000Z,2019-02-20T21:36:39.190000Z,1
G.Z,2018-04-21T09:27:00.050000Z,2019-02-20T21:36:39.190000Z,1
G.E,2018-04-21T08:22:00.000000Z,2019-04-05T09:54:24.000000Z,61
G.N,2018-04-21T08:22:00.000000Z,2019-04-05T09:54:24.000000Z,61
G.Z,2018-04-21T08:22:00.000000Z,2019-04-05T09:56:46.380000Z,61
G.E,2018-04-21T06:30:00.060000Z,2019-02-20T18:36:58.730000Z,1
G.N,2018-04-21T06:30:00.060000Z,2019-02-20T18:39:41.450000Z,1
G.Z,2018-04-21T06:30:00.060000Z,2019-02-20T18:39:41.450000Z,1
G.E,2018-04-21T04:52:00.000000Z,2019-02-20T16:58:59.260000Z,1
G.N,2018-04-21T04:52:00.000000Z,2019-02-20T17:01:41.980000Z,1
G.Z,2018-04-21T04:52:00.000000Z,2019-02-20T17:01:41.980000Z,1
G.E,2018-04-21T03:31:00.030000Z,2019-03-28T11:05:44.640000Z,7
G.N,2018-04-21T03:31:00.030000Z,2019-03-28T11:05:44.640000Z,7
G.Z,2018-04-21T03:31:00.030000Z,2019-03-28T11:05:44.640000Z,7
G.E,2018-04-22T01:53:00.040000Z,2019-01-27T15:54:52.640000Z,1
G.N,2018-04-22T01:53:00.040000Z,2019-01-27T15:54:52.640000Z,1
G.Z,2018-04-22T01:53:00.040000Z,2019-01-27T15:54:52.640000Z,1
G.E,2018-04-22T03:35:00.000000Z,2018-12-04T01:08:34.970000Z,1
G.N,2018-04-22T03:35:00.000000Z,2018-12-04T01:08:34.970000Z,1
G.Z,2018-04-22T03:35:00.000000Z,2018-12-04T01:08:34.970000Z,1
G.E,2018-04-22T05:24:00.000000Z,2019-02-21T17:31:21.020000Z,1
G.N,2018-04-22T05:24:00.000000Z,2019-02-21T17:34:03.740000Z,1
G.Z,2018-04-22T05:24:00.000000Z,2019-02-21T17:34:03.740000Z,1
G.E,2018-04-22T08:15:00.070000Z,2019-02-21T20:22:00.360000Z,1
G.N,2018-04-22T08:15:00.070000Z,2019-02-21T20:24:43.080000Z,1
G.Z,2018-04-22T08:15:00.070000Z,2019-02-21T20:24:43.080000Z,1
G.E,2018-04-24T01:48:00.000000Z,2019-04-06T03:00:37.920000Z,45
G.N,2018-04-24T01:48:00.000000Z,2019-04-06T03:00:37.920000Z,45
G.Z,2018-04-24T01:48:00.000000Z,2019-04-06T03:00:37.920000Z,45
G.E,2018-04-22T07:03:59.930000Z,2018-07-12T14:21:05.610000Z,1
G.N,2018-04-22T07:03:59.930000Z,2018-07-12T14:21:05.610000Z,1
G.Z,2018-04-22T07:03:59.930000Z,2018-07-12T14:21:05.610000Z,1
G.E,2018-04-22T05:06:00.000000Z,2019-04-06T13:47:30.610000Z,55
G.N,2018-04-22T05:06:00.000000Z,2019-04-06T13:47:30.610000Z,55
G.Z,2018-04-22T05:06:00.000000Z,2019-04-06T13:47:30.610000Z,55
G.E,2018-04-22T04:02:00.000000Z,2019-03-06T02:23:40.960000Z,11
G.N,2018-04-22T04:02:00.000000Z,2019-02-07T03:15:01.580000Z,8
G.Z,2018-04-22T04:02:00.000000Z,2019-03-06T02:23:40.960000Z,11
G.E,2018-04-22T02:53:00.050000Z,2019-02-21T14:59:58.290000Z,1
G.N,2018-04-22T02:53:00.050000Z,2019-02-21T15:02:41.010000Z,1
G.Z,2018-04-22T02:53:00.050000Z,2019-02-21T15:02:41.010000Z,1
G.E,2018-04-22T00:49:59.810000Z,2019-01-18T09:34:13.950000Z,1
G.N,2018-04-22T00:49:59.810000Z,2019-01-18T09:34:13.950000Z,1
G.Z,2018-04-22T00:49:59.810000Z,2019-01-18T09:34:13.950000Z,1
G.E,2018-04-24T01:27:59.870000Z,2019-04-04T01:32:35.380000Z,11
G.N,2018-04-24T01:27:59.870000Z,2019-04-04T01:32:35.380000Z,11
G.Z,2018-04-24T01:27:59.870000Z,2019-04-04T01:32:35.380000Z,11
G.E,2018-04-24T02:25:59.880000Z,2019-02-23T14:33:24.500000Z,1
G.N,2018-04-24T02:25:59.880000Z,2019-02-23T14:36:07.220000Z,1
G.Z,2018-04-24T02:25:59.880000Z,2019-02-23T14:36:07.220000Z,1
G.E,2018-04-24T04:25:00.000000Z,2019-01-21T23:53:41.500000Z,2
G.N,2018-04-24T04:25:00.000000Z,2019-01-21T23:53:41.500000Z,2
G.Z,2018-04-24T04:25:00.000000Z,2019-01-21T23:53:41.500000Z,2
G.E,2018-04-24T05:17:59.980000Z,2019-02-23T17:25:08.180000Z,1
G.N,2018-04-24T05:17:59.980000Z,2019-02-23T17:27:50.900000Z,1
G.Z,2018-04-24T05:17:59.980000Z,2019-02-23T17:27:50.900000Z,1
G.E,2018-04-24T05:31:00.050000Z,2019-01-14T17:44:51.070000Z,2
G.N,2018-04-24T05:31:00.050000Z,2019-01-14T17:44:51.070000Z,2
G.Z,2018-04-24T05:31:00.050000Z,2019-01-14T17:44:51.070000Z,2
G.E,2018-04-24T07:11:59.900000Z,2019-08-02T04:18:48.650000Z,2
G.N,2018-04-24T07:11:59.900000Z,2019-08-02T04:18:48.650000Z,2
G.Z,2018-04-24T07:11:59.900000Z,2019-08-02T04:18:48.650000Z,2
G.E,2018-04-24T07:28:00.000000Z,2019-03-18T15:55:14.880000Z,2
G.N,2018-04-24T07:28:00.000000Z,2019-03-18T15:55:14.880000Z,2
G.Z,2018-04-24T07:28:00.000000Z,2019-03-18T15:55:14.880000Z,2
G.E,2018-04-24T08:12:00.000000Z,2018-12-04T11:27:06.130000Z,1
G.N,2018-04-24T08:12:00.000000Z,2018-12-04T11:27:06.130000Z,1
G.Z,2018-04-24T08:12:00.000000Z,2018-12-04T11:27:06.130000Z,1
G.E,2018-04-27T23:45:00.000000Z,2019-04-05T01:45:08.160000Z,30
G.N,2018-04-27T23:45:00.000000Z,2019-04-05T01:45:08.160000Z,30
G.Z,2018-04-27T23:45:00.000000Z,2019-04-05T01:45:08.160000Z,30
G.E,2018-04-26T06:42:00.160000Z,2019-02-25T18:48:48.980000Z,1
G.N,2018-04-26T06:42:00.160000Z,2019-02-25T18:51:31.700000Z,1
G.Z,2018-04-26T06:42:00.160000Z,2019-02-25T18:51:31.700000Z,1
G.E,2018-04-26T05:54:00.000000Z,2019-01-30T18:06:26.420000Z,15
G.N,2018-04-26T05:54:00.000000Z,2019-01-30T18:06:26.420000Z,15
G.Z,2018-04-26T05:54:00.000000Z,2019-01-30T18:06:26.420000Z,15
G.E,2018-04-26T04:57:00.000000Z,2019-04-06T19:35:29.760000Z,102
G.N,2018-04-26T04:57:00.000000Z,2019-04-06T19:35:29.760000Z,102
G.Z,2018-04-26T04:57:00.000000Z,2019-04-06T19:35:29.760000Z,102
G.E,2018-04-26T03:46:00.070000Z,2018-10-10T12:22:21.820000Z,1
G.N,2018-04-26T03:46:00.070000Z,2018-10-10T12:22:21.820000Z,1
G.Z,2018-04-26T03:46:00.070000Z,2018-10-10T12:22:21.820000Z,1
G.E,2018-04-26T04:07:00.000000Z,2019-08-04T01:58:05.020000Z,2
G.N,2018-04-26T04:07:00.000000Z,2019-08-04T01:58:05.020000Z,2
G.Z,2018-04-26T04:07:00.000000Z,2019-08-04T01:58:05.020000Z,2
G.E,2018-04-26T02:29:00.090000Z,2019-03-27T01:19:05.280000Z,2
G.N,2018-04-26T02:29:00.090000Z,2019-03-27T01:27:13.440000Z,2
G.Z,2018-04-26T02:29:00.090000Z,2019-03-27T01:27:13.440000Z,2
G.E,2018-04-26T01:30:00.040000Z,2018-10-15T01:00:05.450000Z,38
G.N,2018-04-26T01:30:00.040000Z,2018-10-15T01:00:05.450000Z,38
G.Z,2018-04-26T01:30:00.040000Z,2018-10-15T01:00:05.450000Z,38
G.E,2018-04-26T01:08:59.910000Z,2018-10-17T04:28:27.180000Z,1
G.N,2018-04-26T01:08:59.910000Z,2018-10-17T04:28:27.180000Z,1
G.Z,2018-04-26T01:08:59.910000Z,2018-10-17T04:28:27.180000Z,1
G.E,2018-04-26T00:25:00.000000Z,2019-04-06T20:54:10.560000Z,46
G.N,2018-04-26T00:25:00.000000Z,2019-04-06T20:54:10.560000Z,46
G.Z,2018-04-26T00:25:00.000000Z,2019-04-06T20:54:10.560000Z,46
G.E,2018-04-25T23:26:00.000000Z,2018-10-17T02:36:08.800000Z,1
G.N,2018-04-25T23:26:00.000000Z,2018-10-17T02:36:08.800000Z,1
G.Z,2018-04-25T23:26:00.000000Z,2018-10-17T02:36:08.800000Z,1
G.E,2018-04-25T06:41:00.020000Z,2018-10-17T01:45:38.540000Z,54
G.N,2018-04-25T06:41:00.020000Z,2018-10-17T01:45:38.540000Z,54
G.Z,2018-04-25T06:41:00.020000Z,2018-10-17T01:45:38.540000Z,54
G.E,2018-04-25T05:52:00.000000Z,2018-10-17T01:14:59.370000Z,1
G.N,2018-04-25T05:52:00.000000Z,2018-10-17T01:14:59.370000Z,1
G.Z,2018-04-25T05:52:00.000000Z,2018-10-17T01:14:59.370000Z,1
G.E,2018-04-25T04:06:00.000000Z,2018-10-17T00:05:22.020000Z,1
G.N,2018-04-25T04:06:00.000000Z,2018-10-17T00:05:22.020000Z,1
G.Z,2018-04-25T04:06:00.000000Z,2018-10-17T00:05:22.020000Z,1
G.E,2018-04-25T02:50:00.160000Z,2018-10-16T23:28:31.900000Z,1
G.N,2018-04-25T02:50:00.160000Z,2018-10-16T23:28:31.900000Z,1
G.Z,2018-04-25T02:50:00.160000Z,2018-10-16T23:28:31.900000Z,1
G.E,2018-04-25T01:19:59.980000Z,2018-10-16T22:37:00.740000Z,1
G.N,2018-04-25T01:19:59.980000Z,2018-10-16T22:37:00.740000Z,1
G.Z,2018-04-25T01:19:59.980000Z,2018-10-16T22:37:00.740000Z,1
G.E,2018-04-25T00:16:59.900000Z,2018-08-06T06:14:20.130000Z,1
G.N,2018-04-25T00:16:59.900000Z,2018-08-06T06:14:20.130000Z,1
G.Z,2018-04-25T00:16:59.900000Z,2018-08-06T06:14:20.130000Z,1
G.E,2018-04-26T23:13:00.000000Z,2018-08-06T23:58:18.140000Z,304
G.N,2018-04-26T23:13:00.000000Z,2018-08-06T23:58:18.140000Z,304
G.Z,2018-04-26T23:13:00.000000Z,2018-08-06T23:58:18.140000Z,304
G.E,2018-04-27T00:28:00.000000Z,2018-10-17T23:12:34.030000Z,1
G.N,2018-04-27T00:28:00.000000Z,2018-10-17T23:12:34.030000Z,1
G.Z,2018-04-27T00:28:00.000000Z,2018-10-17T23:12:34.030000Z,1
G.E,2018-04-27T03:07:00.000000Z,2018-10-18T01:00:31.420000Z,1
G.Z,2018-04-27T03:07:00.000000Z,2018-10-18T00:59:50.740000Z,1
G.E,2018-04-27T03:06:00.070000Z,2018-10-18T01:22:09.570000Z,1
G.N,2018-04-27T03:06:00.070000Z,2018-10-18T01:22:09.570000Z,1
G.Z,2018-04-27T03:06:00.070000Z,2018-10-18T01:22:09.570000Z,1
G.E,2018-04-27T04:12:00.000000Z,2018-10-18T01:46:34.550000Z,1
G.N,2018-04-27T04:12:00.000000Z,2018-10-18T01:46:34.550000Z,1
G.Z,2018-04-27T04:12:00.000000Z,2018-10-18T01:46:34.550000Z,1
G.E,2018-04-27T04:55:00.070000Z,2018-10-18T02:39:40.760000Z,1
G.N,2018-04-27T04:55:00.070000Z,2018-10-18T02:39:40.760000Z,1
G.Z,2018-04-27T04:55:00.070000Z,2018-10-18T02:39:40.760000Z,1
"""

df = pd.read_csv(io.StringIO(s))

I want to group the data by Name. Then, for each group, I want to find out the first datetime (min of Earliest), the last datetime (max of Latest), and the sum of Data.

CodePudding user response:

This should be self-explanatory:

>>> groups = df.groupby('Name')
>>> mins = groups['Earliest'].min()
>>> mins
Name
G.E    2018-04-21T02:18:59.830000Z
G.N    2018-04-21T02:18:59.830000Z
G.Z    2018-04-21T02:18:59.830000Z
Name: Earliest, dtype: object

>>> maxes = groups['Latest'].max()
>>> maxes
Name
G.E    2019-08-04T01:58:05.020000Z
G.N    2019-08-04T01:58:05.020000Z
G.Z    2019-08-04T01:58:05.020000Z
Name: Latest, dtype: object

>>> sums = groups.sum()['Data']
>>> sums
Name
G.E    826
G.N    822
G.Z    826
Name: Data, dtype: int64

>>> for name, sub_df in groups:
...     print(f"The earliest for {name} is {mins[name]}, the latest is {maxes[name]}, and the sum of the data is {sums[name]}.")
  • Related