Iterate between the rows of a DataFrame of pandas 2 by 2-CodePudding

Let be the following DataFrame, where the time column is defined by a Python TimeDelta object:

ID	date	direction	time
0	2022-01-02	IN	NaT
0	2022-01-03	OUT	1 days
0	2022-01-04	IN	NaT
0	2022-01-08	OUT	4 days
1	2022-03-02	IN	NaT
1	2022-03-05	OUT	3 days
2	2022-04-06	IN	NaT
2	2022-04-10	OUT	4 days

I need to iterate over the existing DataFrame to create another one in the most efficient way containing information from rows two by two, i.e:

ID	start_date	end_date	time
0	2022-01-02	2022-01-03	1 days
0	2022-01-04	2022-01-08	4 days
1	2022-03-02	2022-03-05	3 days
2	2022-04-06	2022-04-10	4 days

Assume that the original DataFrame is contained in df. I need a way to iterate over the rows two by two, being able to query in each iteration the information of the first row and its next one. I hope you can help me, thank you in advance.

CodePudding user response：

You can use df.loc df.shift to create final dataframe:

df_out = pd.DataFrame(
    {
        "ID": df.loc[::2, "ID"],
        "start_date": df.loc[::2, "date"],
        "end_date": df.shift(-1).loc[::2, "date"],
        "time": df.shift(-1).loc[::2, "time"],
    }
)
print(df_out)

Prints:

   ID  start_date    end_date    time
0   0  2022-01-02  2022-01-03  1 days
2   0  2022-01-04  2022-01-08  4 days
4   1  2022-03-02  2022-03-05  3 days
6   2  2022-04-06  2022-04-10  4 days

Or:

df_out = pd.DataFrame(
    {
        "ID": df.loc[::2, "ID"].values,
        "start_date": df.loc[::2, "date"].values,
        "end_date": df.loc[1::2, "date"].values,
        "time": df.loc[1::2, "time"].values,
    }
)