Counts of rows per hour-CodePudding

I am exploring bike share data.

I combined two tables: one containing bike share data and the other containing weather data. The 'Date Start' column is in the bike share data. The 'date' column is in the weather data.

I would like to group the count of ID for each hour, so I can see the effect of weather on bike usage.

ID	Start	End	Date Start	Duration	date	temp	wdsp
1754125	Eyre Square South	Glenina	01 Jan 2019 00:17	00:15:02	01-jan-2019 00:00	9.9	4.0
1754170	Brown Doorway	University Hospital Galway	01 Jan 2019 07:55	00:04:57	01-jan-2019 01:00	9.3	4.0
1754209	New Dock Street	New Dock Street	01 Jan 2019 11:42	02:57:57	01-jan-2019 02:00	9.2	5.0
1754211	Claddagh Basin	Merchants Gate	01 Jan 2019 11:50	00:02:43	01-jan-2019 03:00	9.1	5.0

I have tried:

data.groupby(['date','ID']).size()
data.groupby(['date','ID']).size().reset_index(name='counts')

But I don't really know what I'm doing. Any help would be appreciated.

CodePudding user response：

I assume you want to be able to count the rows by ID, Date and hour, so you can do this:

df['Date'] = df['Date Start'].dt.normalize()
df['hour'] = df['Date Start'].apply(lambda x: x.hour)

to get the dates and full hours:

     ID              Start                         End          Date Start  \
0  1754125  Eyre Square South                     Glenina 2019-01-01 00:17:00   
1  1754170      Brown Doorway  University Hospital Galway 2019-01-01 07:55:00   
2  1754209    New Dock Street             New Dock Street 2019-01-01 11:42:00   
3  1754211     Claddagh Basin              Merchants Gate 2019-01-01 11:50:00   

   Duration              date  rain  temp  wdsp       Date  hour  
0  00:15:02  2019-01-01 00:00   0.0   9.9   4.0 2019-01-01     0  
1  00:04:57  2019-01-01 01:00   0.0   9.3   4.0 2019-01-01     7  
2  02:57:57  2019-01-01 02:00   0.0   9.2   5.0 2019-01-01    11  
3  00:02:43  2019-01-01 03:00   0.0   9.1   5.0 2019-01-01    11

and then use group by:

df.groupby(['ID','Date','hour']).size()

which returns

ID        Date        hour
1754125  2019-01-01  0       1
1754170  2019-01-01  7       1
1754209  2019-01-01  11      1
1754211  2019-01-01  11      1
dtype: int64

CodePudding user response：

I did a similar project using the london bike riding data. I parse_dates in the csv then used the timestamp as an index and retrieved the datetime parts

df=pd.read_csv('london_merged.csv', parse_dates=['timestamp'], index_col="timestamp")
df['hour'] = df.index.hour
df['day_of_month'] = df.index.day
df['day_of_week'] = df.index.dayofweek
df['month'] = df.index.month
df['date']=df.index.date
print(df)

df.groupby(['ID','Date','hour']).size()