So I have a CSV with data that looks like:
Machine Name Timestamp
Washer #25 2021-08-22 06:07:21
Washer #2 2021-08-22 06:35:15
Dryer #34 2021-08-22 17:05:55
... ...
Dryer #29 2021-08-28 05:17:12
The CSV itself spans a week, Sunday to Saturday and is over 1300 rows long. I am trying to count how many times machines are used per hour per day. So in the above example my output would look like
2021-08-22: 0600, Machine Count: 2
2021-08-22: 1700, Machine Count: 1
2021-08-23: 0500, Machine Count: 1
It uses military hours so 0000-2359
Currently I am using Pandas, and have opened and read my CSV and grabbed the hour and date out of the Timestamp column. I re-added the hour as a new column at the end of the dataframe, so a timestamp of 09:45:16 would be added in as 9. This was done while I was experimenting and can be taken out. Current code is:
csv = PATHTOCSV
df = pd.read_csv(csv)
hourHolder = (pd.to_datetime(df['Timestamp']).dt.hour)
dateHolder = pd.to_datetime(df['Timestamp']).dt.date
df['Hour'] = hourHolder
fullDictionary = df.to_dict('Records')
I have tried various methods using loops and trying to query the data directly using df.groupby(pd.Grouper(key='Timestamp', freq='H')).count()
However that gives an error:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
CodePudding user response:
Floor the datetime column to the hour, groupby and count.
s = """Machine Name,Timestamp
Washer #25,2021-08-22 06:07:21
Washer #2,2021-08-22 06:35:15
Dryer #34,2021-08-22 17:05:55
Dryer #29,2021-08-28 05:17:12"""
df = pd.read_csv(StringIO(s))
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df.groupby(df['Timestamp'].dt.floor('h'))['Machine Name'].count()
Timestamp
2021-08-22 06:00:00 2
2021-08-22 17:00:00 1
2021-08-28 05:00:00 1
Name: Machine Name, dtype: int64