I am exploring bike share data.
I combined two tables: one containing bike share data and the other containing weather data. The 'Date Start' column is in the bike share data. The 'date' column is in the weather data.
I would like to group the count of ID for each hour, so I can see the effect of weather on bike usage.
ID | Start | End | Date Start | Duration | date | rain | temp | wdsp |
---|---|---|---|---|---|---|---|---|
1754125 | Eyre Square South | Glenina | 01 Jan 2019 00:17 | 00:15:02 | 01-jan-2019 00:00 | 0.0 | 9.9 | 4.0 |
1754170 | Brown Doorway | University Hospital Galway | 01 Jan 2019 07:55 | 00:04:57 | 01-jan-2019 01:00 | 0.0 | 9.3 | 4.0 |
1754209 | New Dock Street | New Dock Street | 01 Jan 2019 11:42 | 02:57:57 | 01-jan-2019 02:00 | 0.0 | 9.2 | 5.0 |
1754211 | Claddagh Basin | Merchants Gate | 01 Jan 2019 11:50 | 00:02:43 | 01-jan-2019 03:00 | 0.0 | 9.1 | 5.0 |
I have tried:
data.groupby(['date','ID']).size()
data.groupby(['date','ID']).size().reset_index(name='counts')
But I don't really know what I'm doing. Any help would be appreciated.
CodePudding user response:
I assume you want to be able to count the rows by ID, Date and hour, so you can do this:
df['Date'] = df['Date Start'].dt.normalize()
df['hour'] = df['Date Start'].apply(lambda x: x.hour)
to get the dates and full hours:
ID Start End Date Start \
0 1754125 Eyre Square South Glenina 2019-01-01 00:17:00
1 1754170 Brown Doorway University Hospital Galway 2019-01-01 07:55:00
2 1754209 New Dock Street New Dock Street 2019-01-01 11:42:00
3 1754211 Claddagh Basin Merchants Gate 2019-01-01 11:50:00
Duration date rain temp wdsp Date hour
0 00:15:02 2019-01-01 00:00 0.0 9.9 4.0 2019-01-01 0
1 00:04:57 2019-01-01 01:00 0.0 9.3 4.0 2019-01-01 7
2 02:57:57 2019-01-01 02:00 0.0 9.2 5.0 2019-01-01 11
3 00:02:43 2019-01-01 03:00 0.0 9.1 5.0 2019-01-01 11
and then use group by:
df.groupby(['ID','Date','hour']).size()
which returns
ID Date hour
1754125 2019-01-01 0 1
1754170 2019-01-01 7 1
1754209 2019-01-01 11 1
1754211 2019-01-01 11 1
dtype: int64
CodePudding user response:
I did a similar project using the london bike riding data. I parse_dates in the csv then used the timestamp as an index and retrieved the datetime parts
df=pd.read_csv('london_merged.csv', parse_dates=['timestamp'], index_col="timestamp")
df['hour'] = df.index.hour
df['day_of_month'] = df.index.day
df['day_of_week'] = df.index.dayofweek
df['month'] = df.index.month
df['date']=df.index.date
print(df)
df.groupby(['ID','Date','hour']).size()