Home > database >  Counts of rows per hour
Counts of rows per hour

Time:11-19

I am exploring bike share data.

I combined two tables: one containing bike share data and the other containing weather data. The 'Date Start' column is in the bike share data. The 'date' column is in the weather data.

I would like to group the count of ID for each hour, so I can see the effect of weather on bike usage.

ID Start End Date Start Duration date rain temp wdsp
1754125 Eyre Square South Glenina 01 Jan 2019 00:17 00:15:02 01-jan-2019 00:00 0.0 9.9 4.0
1754170 Brown Doorway University Hospital Galway 01 Jan 2019 07:55 00:04:57 01-jan-2019 01:00 0.0 9.3 4.0
1754209 New Dock Street New Dock Street 01 Jan 2019 11:42 02:57:57 01-jan-2019 02:00 0.0 9.2 5.0
1754211 Claddagh Basin Merchants Gate 01 Jan 2019 11:50 00:02:43 01-jan-2019 03:00 0.0 9.1 5.0

I have tried:

data.groupby(['date','ID']).size()
data.groupby(['date','ID']).size().reset_index(name='counts')

But I don't really know what I'm doing. Any help would be appreciated.

CodePudding user response:

I assume you want to be able to count the rows by ID, Date and hour, so you can do this:

df['Date'] = df['Date Start'].dt.normalize()
df['hour'] = df['Date Start'].apply(lambda x: x.hour)

to get the dates and full hours:

     ID              Start                         End          Date Start  \
0  1754125  Eyre Square South                     Glenina 2019-01-01 00:17:00   
1  1754170      Brown Doorway  University Hospital Galway 2019-01-01 07:55:00   
2  1754209    New Dock Street             New Dock Street 2019-01-01 11:42:00   
3  1754211     Claddagh Basin              Merchants Gate 2019-01-01 11:50:00   

   Duration              date  rain  temp  wdsp       Date  hour  
0  00:15:02  2019-01-01 00:00   0.0   9.9   4.0 2019-01-01     0  
1  00:04:57  2019-01-01 01:00   0.0   9.3   4.0 2019-01-01     7  
2  02:57:57  2019-01-01 02:00   0.0   9.2   5.0 2019-01-01    11  
3  00:02:43  2019-01-01 03:00   0.0   9.1   5.0 2019-01-01    11  

and then use group by:

df.groupby(['ID','Date','hour']).size()

which returns

ID        Date        hour
1754125  2019-01-01  0       1
1754170  2019-01-01  7       1
1754209  2019-01-01  11      1
1754211  2019-01-01  11      1
dtype: int64

CodePudding user response:

I did a similar project using the london bike riding data. I parse_dates in the csv then used the timestamp as an index and retrieved the datetime parts

df=pd.read_csv('london_merged.csv', parse_dates=['timestamp'], index_col="timestamp")
df['hour'] = df.index.hour
df['day_of_month'] = df.index.day
df['day_of_week'] = df.index.dayofweek
df['month'] = df.index.month
df['date']=df.index.date
print(df)

df.groupby(['ID','Date','hour']).size()
  • Related