I'm trying to generate a frequency DataFrame in Jupyter/Python of HH:MM:SS elements of a datetime column.
Asside from itterating through all HH:MM:SS combinations and counting them (I need to include 0 values), is there a function in Python that can do it for me?
.value_counts() creates what I need, however, 0 values are not included.
Many thanks, in advance, for your assistance :)
EDIT:
Example Data:
TransactionID | DateTime | Date | Time |
---|---|---|---|
012sad9j20j | 01/01/22 04:23:32 | 01/01/22 | 04:23:32 |
938hfd82dj2 | 07/04/22 23:12:59 | 07/04/22 | 23:12:59 |
s9j20jd902j | 18/05/22 13:44:19 | 18/05/22 | 13:44:19 |
Expected to generate a dataframe containing:
Time | Count |
---|---|
04:23:31 | 0 |
04:23:32 | 1 |
04:23:33 | 0 |
CodePudding user response:
Here's a solution:
from pandas import DataFrame
from datetime import datetime, time
df = DataFrame([
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=16, hour=13, minute=1, second=2)},
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=16, hour=13, minute=3, second=4)},
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=17, hour=13, minute=5, second=6)},
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=18, hour=13, minute=1, second=2)},
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=19, hour=13, minute=3, second=4)}
])
times = DataFrame(df['DateTime'].dt.time).groupby(['DateTime'])['DateTime'].count()
time_counts = DataFrame((
(t := time(h, m, s), int(times[t]) if t in times else 0)
for h in range(24) for m in range(60) for s in range(60)
), columns=['Time', 'Counts'])
print(time_counts)
print(time_counts[time_counts['Time'] == time(13, 1, 2)])
This basically creates an additional DataFrame times
based on the original data (assuming you don't even have a Time
and Date
column, but if you do, you can of course use those) - times
takes all the times from df
and then groups and counts them.
However, that is missing the times which don't occur in df
, so time_counts
is constructed by generated all possible times and either selecting the count from times
or 0 if it doesn't exist in times
.
Result:
Time Counts
0 00:00:00 0
1 00:00:01 0
2 00:00:02 0
3 00:00:03 0
4 00:00:04 0
... ... ...
86395 23:59:55 0
86396 23:59:56 0
86397 23:59:57 0
86398 23:59:58 0
86399 23:59:59 0
[86400 rows x 2 columns]
Time Counts
46862 13:01:02 2
User @riley's suggestion seems to point at a nicer solution, something like:
# create df as before, then:
df['Time'] = df['DateTime'].dt.time.astype('category')
result = df.value_counts('Time')
But that seems to have the same problem as you originally stated, even though the dtype
of the 'Time'
column is category
. Perhaps someone has additional suggestions to make that work.