Python groupby id and datetime(only date and hour)-CodePudding

I have dateframe list data like as below:

[
{"at": 2023-01-13 16:37, "id": 2},
{"at": 2023-01-13 16:38, "id": 2},
{"at": 2023-01-12 10:03, "id": 2},
{"at": 2023-01-11 10:45, "id": 2},
{"at": 2023-01-13 16:02, "id": 3},
]

I want to group this df with id and at but at must be created by date and hour. The data that I want must be as below.

[
{"id": 2, "at": 2023-01-13- 16:00:00},
{"id": 2, "at": 2023-01-12- 10:00:00},
{"id": 2, "at": 2023-01-11- 10:00:00},
{"id": 3, "at": 2023-01-13- 16:00:00}
]

How can I do that with pandas? Could you help me?

CodePudding user response：

You can get date and hour the use drop_duplicated.

df['at'] = pd.to_datetime(df['at']).dt.to_period('H').dt.strftime("%Y-%m-%d- %H:%M:%S")
df_new = df.drop_duplicates(subset=['at', 'id'])
print(df_new)

Output:

                     at  id
0  2023-01-13- 16:00:00   2
2  2023-01-12- 10:00:00   2
3  2023-01-11- 10:00:00   2
4  2023-01-13- 16:00:00   3

And if you want back to list of dicts.

>>> df_new.to_dict('records')
[{'at': '2023-01-13- 16:00:00', 'id': 2},
 {'at': '2023-01-12- 10:00:00', 'id': 2},
 {'at': '2023-01-11- 10:00:00', 'id': 2},
 {'at': '2023-01-13- 16:00:00', 'id': 3}]

CodePudding user response：

If need aggregate some function use Series.dt.floor :

df['at'] = pd.to_datetime(df['at'])
#some aggregae function - e.g. size
df.groupby(['id', df['at'].dt.floor('H')]).size()

If need remove duplicates use Series.dt.floor for convert minutes and seconds to 00:00 and remove duplicates by DataFrame.drop_duplicates:

df['at'] = pd.to_datetime(df['at'])

df = df.assign(at = df['at'].dt.floor('H')).drop_duplicates()
print (df)
                   at  id
0 2023-01-13 16:00:00   2
2 2023-01-12 10:00:00   2
3 2023-01-11 10:00:00   2
4 2023-01-13 16:00:00   3

If need strings format of datetimes use Series.dt.strftime:

df['at'] = pd.to_datetime(df['at'])

df = (df.assign(at = df['at'].dt.floor('H')
                             .dt.strftime("%Y-%m-%d- %H:%M:%S"))
        .drop_duplicates())
print (df)
                     at  id
0  2023-01-13- 16:00:00   2
2  2023-01-12- 10:00:00   2
3  2023-01-11- 10:00:00   2
4  2023-01-13- 16:00:00   3

Last for lsit of dictionaries with swapped order use sublist and DataFrame.to_dict:

print (df[['id','at']].to_dict(orient='records'))

[{'id': 2, 'at': '2023-01-13- 16:00:00'},
 {'id': 2, 'at': '2023-01-12- 10:00:00'},
 {'id': 2, 'at': '2023-01-11- 10:00:00'}, 
 {'id': 3, 'at': '2023-01-13- 16:00:00'}