I have dateframe list data like as below:
[
{"at": 2023-01-13 16:37, "id": 2},
{"at": 2023-01-13 16:38, "id": 2},
{"at": 2023-01-12 10:03, "id": 2},
{"at": 2023-01-11 10:45, "id": 2},
{"at": 2023-01-13 16:02, "id": 3},
]
I want to group this df with id and at but at must be created by date and hour. The data that I want must be as below.
[
{"id": 2, "at": 2023-01-13- 16:00:00},
{"id": 2, "at": 2023-01-12- 10:00:00},
{"id": 2, "at": 2023-01-11- 10:00:00},
{"id": 3, "at": 2023-01-13- 16:00:00}
]
How can I do that with pandas? Could you help me?
CodePudding user response:
You can get date
and hour
the use drop_duplicated
.
df['at'] = pd.to_datetime(df['at']).dt.to_period('H').dt.strftime("%Y-%m-%d- %H:%M:%S")
df_new = df.drop_duplicates(subset=['at', 'id'])
print(df_new)
Output:
at id
0 2023-01-13- 16:00:00 2
2 2023-01-12- 10:00:00 2
3 2023-01-11- 10:00:00 2
4 2023-01-13- 16:00:00 3
And if you want back to list
of dict
s.
>>> df_new.to_dict('records')
[{'at': '2023-01-13- 16:00:00', 'id': 2},
{'at': '2023-01-12- 10:00:00', 'id': 2},
{'at': '2023-01-11- 10:00:00', 'id': 2},
{'at': '2023-01-13- 16:00:00', 'id': 3}]
CodePudding user response:
If need aggregate some function use Series.dt.floor
:
df['at'] = pd.to_datetime(df['at'])
#some aggregae function - e.g. size
df.groupby(['id', df['at'].dt.floor('H')]).size()
If need remove duplicates use Series.dt.floor
for convert minutes and seconds to 00:00
and remove duplicates by DataFrame.drop_duplicates
:
df['at'] = pd.to_datetime(df['at'])
df = df.assign(at = df['at'].dt.floor('H')).drop_duplicates()
print (df)
at id
0 2023-01-13 16:00:00 2
2 2023-01-12 10:00:00 2
3 2023-01-11 10:00:00 2
4 2023-01-13 16:00:00 3
If need strings format of datetimes use Series.dt.strftime
:
df['at'] = pd.to_datetime(df['at'])
df = (df.assign(at = df['at'].dt.floor('H')
.dt.strftime("%Y-%m-%d- %H:%M:%S"))
.drop_duplicates())
print (df)
at id
0 2023-01-13- 16:00:00 2
2 2023-01-12- 10:00:00 2
3 2023-01-11- 10:00:00 2
4 2023-01-13- 16:00:00 3
Last for lsit of dictionaries with swapped order use sublist and DataFrame.to_dict
:
print (df[['id','at']].to_dict(orient='records'))
[{'id': 2, 'at': '2023-01-13- 16:00:00'},
{'id': 2, 'at': '2023-01-12- 10:00:00'},
{'id': 2, 'at': '2023-01-11- 10:00:00'},
{'id': 3, 'at': '2023-01-13- 16:00:00'}