Home > Net >  Creation of a time-only frequency in Python
Creation of a time-only frequency in Python

Time:06-17

I'm trying to generate a frequency DataFrame in Jupyter/Python of HH:MM:SS elements of a datetime column.

Asside from itterating through all HH:MM:SS combinations and counting them (I need to include 0 values), is there a function in Python that can do it for me?

.value_counts() creates what I need, however, 0 values are not included.

Many thanks, in advance, for your assistance :)

EDIT:

Example Data:

TransactionID DateTime Date Time
012sad9j20j 01/01/22 04:23:32 01/01/22 04:23:32
938hfd82dj2 07/04/22 23:12:59 07/04/22 23:12:59
s9j20jd902j 18/05/22 13:44:19 18/05/22 13:44:19

Expected to generate a dataframe containing:

Time Count
04:23:31 0
04:23:32 1
04:23:33 0

CodePudding user response:

Here's a solution:

from pandas import DataFrame
from datetime import datetime, time

df = DataFrame([
    {'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=16, hour=13, minute=1, second=2)},
    {'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=16, hour=13, minute=3, second=4)},
    {'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=17, hour=13, minute=5, second=6)},
    {'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=18, hour=13, minute=1, second=2)},
    {'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=19, hour=13, minute=3, second=4)}
])

times = DataFrame(df['DateTime'].dt.time).groupby(['DateTime'])['DateTime'].count()

time_counts = DataFrame((
    (t := time(h, m, s), int(times[t]) if t in times else 0)
    for h in range(24) for m in range(60) for s in range(60)
), columns=['Time', 'Counts'])
print(time_counts)

print(time_counts[time_counts['Time'] == time(13, 1, 2)])

This basically creates an additional DataFrame times based on the original data (assuming you don't even have a Time and Date column, but if you do, you can of course use those) - times takes all the times from df and then groups and counts them.

However, that is missing the times which don't occur in df, so time_counts is constructed by generated all possible times and either selecting the count from times or 0 if it doesn't exist in times.

Result:

           Time  Counts
0      00:00:00       0
1      00:00:01       0
2      00:00:02       0
3      00:00:03       0
4      00:00:04       0
...         ...     ...
86395  23:59:55       0
86396  23:59:56       0
86397  23:59:57       0
86398  23:59:58       0
86399  23:59:59       0

[86400 rows x 2 columns]
           Time  Counts
46862  13:01:02       2

User @riley's suggestion seems to point at a nicer solution, something like:

# create df as before, then:

df['Time'] = df['DateTime'].dt.time.astype('category')
result = df.value_counts('Time')

But that seems to have the same problem as you originally stated, even though the dtype of the 'Time' column is category. Perhaps someone has additional suggestions to make that work.

  • Related