Using Pandas Iterrows and Derive Time Difference into an Other Column-CodePudding

I'm wanting to find how long LocID_2 has spend at location Loc_2.

For example, the time LocID_2 006 spent at Loc_2 11.0 is 1 second. The time LocID_2 006 spent at Loc_2 12.0 is 3 second. The time LocID_2 003 spent at Loc_2 14.0 is 11 second, and so on. I need to derive time_at_loc_1

              TimeStamp      Loc_1 Loc_2 LocID_2 time_at_loc_1
0   2022-07-24 07:06:48  10.331905  11.0     006       0:00:01
1   2022-07-24 07:06:49  10.479264  11.0     006       0:00:01
2   2022-07-24 07:06:50  10.538208  12.0     006       0:00:03
3   2022-07-24 07:06:51  10.685568  12.0     006       0:00:03
4   2022-07-24 07:06:52  10.744512  12.0     006       0:00:03
5   2022-07-24 07:06:53  10.862399  12.0     006       0:00:03
6   2022-07-24 07:07:32  10.862399  17.0     006       0:00:05
7   2022-07-24 07:07:33  10.891870  17.0     006       0:00:05
8   2022-07-24 07:07:35  11.068702  17.0     006       0:00:05
9   2022-07-24 07:07:37  11.186589  17.0     006       0:00:05
10  2022-07-24 07:07:39  11.363420  14.0     003       0:00:11
11  2022-07-24 07:07:50  15.422364  14.0     003       0:00:11

The code I currently have. Just not sure how to iterate properly to derive the time difference and allocate correctly. Currently it takes that last derived time (e.g. 0:00:11) and allocates it to the entire 'time_at_loc_1' column.

lst_RFID = []
for k in df['Loc_2']:
    lst_RFID.append(k)

timedelta = []
for x in set(lst_RFID):
    df_RFID = df.loc[df['Loc_2'].isin([x])]
    time_min1 = pd.to_datetime(df_RFID["TimeStamp"].min(), format='%Y-%m-%d %H:%M:%S')
    time_max1 = pd.to_datetime(df_RFID["TimeStamp"].max(), format='%Y-%m-%d %H:%M:%S')
    time_over1 = time_max1 - time_min1
    
    for index, row in df.iterrows():
        time_delta = str(datetime.timedelta(seconds=time_over1.seconds))
        if row['Loc_2']) == x:
            df['time_at_loc_1'] = time_delta

Assistance greatly appreciated to derive 'time_at_loc_1'

tks

CodePudding user response：

Iterrows is a trap. It's basically never the best option, and often leads to more trouble than it's worth. Doing things through manual iteration throws out the whole point of using pandas.

df.TimeStamp = pd.to_datetime(df.TimeStamp)
df['time_at_loc_1'] = df.groupby('Loc_2')['TimeStamp'].transform(lambda x: x.max()-x.min())
print(df)

Output:

             TimeStamp      Loc_1  Loc_2 LocID_2   time_at_loc_1
0  2022-07-24 07:06:48  10.331905   11.0     006 0 days 00:00:01
1  2022-07-24 07:06:49  10.479264   11.0     006 0 days 00:00:01
2  2022-07-24 07:06:50  10.538208   12.0     006 0 days 00:00:03
3  2022-07-24 07:06:51  10.685568   12.0     006 0 days 00:00:03
4  2022-07-24 07:06:52  10.744512   12.0     006 0 days 00:00:03
5  2022-07-24 07:06:53  10.862399   12.0     006 0 days 00:00:03
6  2022-07-24 07:07:32  10.862399   17.0     006 0 days 00:00:05
7  2022-07-24 07:07:33  10.891870   17.0     006 0 days 00:00:05
8  2022-07-24 07:07:35  11.068702   17.0     006 0 days 00:00:05
9  2022-07-24 07:07:37  11.186589   17.0     006 0 days 00:00:05
10 2022-07-24 07:07:39  11.363420   14.0     003 0 days 00:00:11
11 2022-07-24 07:07:50  15.422364   14.0     003 0 days 00:00:11