Given the following DataFrame of pandas in Python:
| ID | date |
|--------------|---------------------------------------|
| ESP | 2022-03-02 07:24:19 01:00 |
| ESP | 2022-03-02 07:24:19 01:00 |
| ESP | 2022-03-02 08:00:00 01:00 |
| UK | 2022-03-02 08:08:30 01:00 |
| ESP | 2022-03-02 09:11:50 01:00 |
| USA | 2022-03-02 10:19:11 01:00 |
| UK | 2022-03-02 10:12:11 01:00 |
| USA | 2022-03-03 08:33:22 01:00 |
| USA | 2022-03-03 09:23:22 01:00 |
| UK | 2022-03-03 12:13:22 01:00 |
| UK | 2022-03-03 12:35:22 01:00 |
With the following code implemented in Python, I get the following DataFrame:
def create_dataframe(df):
df['date'] = pd.to_datetime(df['date'].astype(str).str.split(' ').str[0])
string = df['date'].groupby(df['date'].dt.floor('H')).count()
df = pd.DataFrame({'date': string.index.date, 'start_interval': string.index.time,
'end_interval': (string.index pd.DateOffset(hours=1)).time,
'total_rows': string.to_numpy()})
return df
| date | start_interval | end_interval | total_rows |
|-----------------------|-------------------|-------------------|------------|
| 2022-03-02 | 07:00:00 | 08:00:00 | 2 |
| 2022-03-02 | 08:00:00 | 09:00:00 | 2 |
| 2022-03-02 | 09:00:00 | 10:00:00 | 1 |
| 2022-03-02 | 10:00:00 | 11:00:00 | 2 |
| 2022-03-03 | 08:00:00 | 09:00:00 | 1 |
| 2022-03-03 | 09:00:00 | 10:00:00 | 1 |
| 2022-03-03 | 12:00:00 | 13:00:00 | 2 |
I would like to add to the table the information provided by the 'ID' column, i.e. get this DataFrame:
| ID | date | start_interval | end_interval | total_rows |
|--------|-----------------------|-------------------|-------------------|------------|
| ESP | 2022-03-02 | 07:00:00 | 08:00:00 | 2 |
| ESP | 2022-03-02 | 08:00:00 | 09:00:00 | 1 |
| UK | 2022-03-02 | 08:00:00 | 09:00:00 | 1 |
| ESP | 2022-03-02 | 09:00:00 | 10:00:00 | 1 |
| USA | 2022-03-02 | 10:00:00 | 11:00:00 | 1 |
| UK | 2022-03-02 | 10:00:00 | 11:00:00 | 1 |
| USA | 2022-03-03 | 08:00:00 | 09:00:00 | 1 |
| USA | 2022-03-03 | 09:00:00 | 10:00:00 | 1 |
| UK | 2022-03-03 | 12:00:00 | 13:00:00 | 2 |
How could I modify the supplied code to obtain the resulting table? Thank you in advance for your help.
CodePudding user response:
Does this produce what you are looking for:
result = (
df
.groupby(['ID', df['date'].dt.floor('H')]).agg(total_rows=('date', 'count'))
.reset_index()
.assign(
start_interval=lambda df: df['date'].dt.time,
end_interval=lambda df: (df['date'] pd.Timedelta(hours=1)).dt.time,
date=lambda df: df['date'].dt.date
)
)
Result:
ID date total_rows start_interval end_interval
0 ESP 2022-03-02 2 07:00:00 08:00:00
1 ESP 2022-03-02 1 08:00:00 09:00:00
2 ESP 2022-03-02 1 09:00:00 10:00:00
3 UK 2022-03-02 1 08:00:00 09:00:00
4 UK 2022-03-02 1 10:00:00 11:00:00
5 UK 2022-03-03 2 12:00:00 13:00:00
6 USA 2022-03-02 1 10:00:00 11:00:00
7 USA 2022-03-03 1 08:00:00 09:00:00
8 USA 2022-03-03 1 09:00:00 10:00:00