Home > Back-end >  Python How to count the amount of times something happens in an hour per day per week from a CSV tha
Python How to count the amount of times something happens in an hour per day per week from a CSV tha

Time:09-22

So I have a CSV with data that looks like:

Machine Name Timestamp
Washer #25   2021-08-22 06:07:21
Washer #2    2021-08-22 06:35:15
Dryer #34    2021-08-22 17:05:55
...          ...
Dryer #29    2021-08-28 05:17:12

The CSV itself spans a week, Sunday to Saturday and is over 1300 rows long. I am trying to count how many times machines are used per hour per day. So in the above example my output would look like

2021-08-22: 0600, Machine Count: 2
2021-08-22: 1700, Machine Count: 1
2021-08-23: 0500, Machine Count: 1

It uses military hours so 0000-2359

Currently I am using Pandas, and have opened and read my CSV and grabbed the hour and date out of the Timestamp column. I re-added the hour as a new column at the end of the dataframe, so a timestamp of 09:45:16 would be added in as 9. This was done while I was experimenting and can be taken out. Current code is:

csv = PATHTOCSV
df = pd.read_csv(csv)

hourHolder = (pd.to_datetime(df['Timestamp']).dt.hour)
dateHolder = pd.to_datetime(df['Timestamp']).dt.date
df['Hour'] = hourHolder

fullDictionary = df.to_dict('Records')

I have tried various methods using loops and trying to query the data directly using df.groupby(pd.Grouper(key='Timestamp', freq='H')).count()

However that gives an error:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

CodePudding user response:

Floor the datetime column to the hour, groupby and count.

s = """Machine Name,Timestamp
Washer #25,2021-08-22 06:07:21
Washer #2,2021-08-22 06:35:15
Dryer #34,2021-08-22 17:05:55
Dryer #29,2021-08-28 05:17:12"""

df = pd.read_csv(StringIO(s))
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df.groupby(df['Timestamp'].dt.floor('h'))['Machine Name'].count()

Timestamp
2021-08-22 06:00:00    2
2021-08-22 17:00:00    1
2021-08-28 05:00:00    1
Name: Machine Name, dtype: int64
  • Related