I want to count the events for every 1 second for the csv data file and draw a histogram according to the results. But I don't understand how I can get the number of events in every second. Can someone please help me with this issue?
code is :
from matplotlib import pyplot as pl import pandas as pd import numpy as np
def read_data():
df = pd.read_csv("test.csv", usecols=['time', 'unix_time', 'name'])
df['time'] = pd.to_datetime(df['time'])
df['unix_time'] = (df['unix_time']).astype(int)
df.info()
i = 1
time_counts = df.groupby((3600 * df.time.dt.minute df.time.dt.second) // i * i)['time'].count()
print(time_counts)
if __name__ == "__main__":
read_data()
output is looks strange:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 33 non-null datetime64[ns]
1 unix_time 33 non-null int32
2 name 33 non-null object
dtypes: datetime64[ns](1), int32(1), object(1)
memory usage: 788.0 bytes
time
18 1
25217 1
43209 1
43219 1
46804 1
54047 1
61241 1
64815 1
64833 1
68402 1
75620 1
79235 1
82806 1
82837 2
86407 1
86446 1
93625 1
97254 1
104446 1
140438 1
144050 1
162025 1
169250 1
180050 1
183623 1
183658 1
194404 1
194412 2
194433 1
194438 1
205219 1
Name: time, dtype: int64
data in csv is :
time unix_time name
2022-12-15 08:00:18.034 1671091218034 apple
2022-12-15 08:07:17.376 1671091637376 apple
2022-12-15 08:12:09.648 1671091929648 apple
2022-12-15 08:12:19.320 1671091939320 apple
2022-12-15 08:13:04.623 1671091984623 apple
2022-12-15 08:15:47.103 1671092147103 apple
2022-12-15 08:17:41.878 1671092261878 apple
2022-12-15 08:18:15.842 1671092295842 apple
2022-12-15 08:18:33.786 1671092313786 apple
2022-12-15 08:19:02.022 1671092342022 apple
2022-12-15 08:21:20.350 1671092480350 apple
2022-12-15 08:22:35.603 1671092555603 apple
2022-12-15 08:23:06.009 1671092586009 apple
2022-12-15 08:23:37.101 1671092617101 apple
2022-12-15 08:23:37.334 1671092617334 apple
2022-12-15 08:24:07.645 1671092647645 apple
2022-12-15 08:24:46.978 1671092686978 apple
2022-12-15 08:26:25.430 1671092785430 apple
2022-12-15 08:27:54.027 1671092874027 apple
2022-12-15 08:29:46.712 1671092986712 apple
2022-12-15 08:39:38.742 1671093578742 apple
2022-12-15 08:40:50.310 1671093650310 apple
2022-12-15 08:45:25.007 1671093925007 apple
2022-12-15 08:47:50.770 1671094070770 apple
2022-12-15 08:50:50.856 1671094250856 apple
2022-12-15 08:51:23.914 1671094283914 apple
2022-12-15 08:51:58.572 1671094318572 apple
2022-12-15 08:54:04.959 1671094444959 apple
2022-12-15 08:54:12.424 1671094452424 apple
2022-12-15 08:54:12.807 1671094452807 apple
2022-12-15 08:54:33.562 1671094473562 apple
2022-12-15 08:54:38.531 1671094478531 apple
2022-12-15 08:57:19.777 1671094639777 apple
CodePudding user response:
Use Grouper
by one seconds frequency:
df['time'] = pd.to_datetime(df['time'])
time_counts = df.groupby(pd.Grouper(freq='1s', key='time'))['time'].count()
print(time_counts)
time
2022-12-15 08:00:18 1
2022-12-15 08:00:19 0
2022-12-15 08:00:20 0
2022-12-15 08:00:21 0
2022-12-15 08:00:22 0
..
2022-12-15 08:57:15 0
2022-12-15 08:57:16 0
2022-12-15 08:57:17 0
2022-12-15 08:57:18 0
2022-12-15 08:57:19 1
Freq: S, Name: time, Length: 3422, dtype: int64
Or Series.dt.floor
for remove miliseconds:
df['time'] = pd.to_datetime(df['time'])
time_counts = df.groupby(df['time'].dt.floor('S'))['time'].count()
print(time_counts)
time
2022-12-15 08:00:18 1
2022-12-15 08:07:17 1
2022-12-15 08:12:09 1
2022-12-15 08:12:19 1
2022-12-15 08:13:04 1
2022-12-15 08:15:47 1
2022-12-15 08:17:41 1
2022-12-15 08:18:15 1
2022-12-15 08:18:33 1
2022-12-15 08:19:02 1
2022-12-15 08:21:20 1
2022-12-15 08:22:35 1
2022-12-15 08:23:06 1
2022-12-15 08:23:37 2
2022-12-15 08:24:07 1
2022-12-15 08:24:46 1
2022-12-15 08:26:25 1
2022-12-15 08:27:54 1
2022-12-15 08:29:46 1
2022-12-15 08:39:38 1
2022-12-15 08:40:50 1
2022-12-15 08:45:25 1
2022-12-15 08:47:50 1
2022-12-15 08:50:50 1
2022-12-15 08:51:23 1
2022-12-15 08:51:58 1
2022-12-15 08:54:04 1
2022-12-15 08:54:12 2
2022-12-15 08:54:33 1
2022-12-15 08:54:38 1
2022-12-15 08:57:19 1
Name: time, dtype: int64