datetimes I'm new to pandas and need some help. I have a csv file with namy datetimes registered by many devices sorted by an ID. They are the time and date that a specific device was registered in the area. I need to calculate how much time each device was there (like minutes), based on minimum and maximum datetimes, or tell that the device was there in case of only one register and I got stuck on the beggining. I guess I need to do a groupby of some sort, but I'm lost.
All I coud do was:
visitas = pd.read_csv("visitas.csv")
visitas['datetime_local'] = pd.to_datetime(visitas.datetime_local)
CodePudding user response:
Using .groupby:
import numpy as np
import pandas as pd
data = {
"device_id": [1, 2, 3, 4, 3, 4, 1, 2],
"datetime_local": ["2022-09-22 06:00:00", "2022-09-22 08:58:10", "2022-09-22 22:23:02", "2022-09-22 09:12:54",
"2022-09-23 01:16:17", "2022-09-22 11:18:05", "2022-09-22 12:01:23", "2022-09-22 09:15:02"]
}
visitas = pd.DataFrame(data)
visitas["datetime_local"] = pd.to_datetime(visitas["datetime_local"])
visitas["device_present_minutes"] = (
visitas
.sort_values(["device_id", "datetime_local"], ascending=True)
.groupby("device_id")["datetime_local"]
.diff() / pd.Timedelta(minutes=1)
)
visitas = (
visitas
.replace({np.NaN: "-"})
.sort_values("device_id", ascending=True)
.reset_index(drop=True)
)
print(visitas)
device_id datetime_local device_present_minutes
0 1 2022-09-22 06:00:00 -
1 1 2022-09-22 12:01:23 361.383333
2 2 2022-09-22 08:58:10 -
3 2 2022-09-22 09:15:02 16.866667
4 3 2022-09-22 22:23:02 -
5 3 2022-09-23 01:16:17 173.25
6 4 2022-09-22 09:12:54 -
7 4 2022-09-22 11:18:05 125.183333