Home > database >  Iterate between the rows of a DataFrame of pandas 2 by 2
Iterate between the rows of a DataFrame of pandas 2 by 2

Time:04-12

Let be the following DataFrame, where the time column is defined by a Python TimeDelta object:

ID date direction time
0 2022-01-02 IN NaT
0 2022-01-03 OUT 1 days
0 2022-01-04 IN NaT
0 2022-01-08 OUT 4 days
1 2022-03-02 IN NaT
1 2022-03-05 OUT 3 days
2 2022-04-06 IN NaT
2 2022-04-10 OUT 4 days

I need to iterate over the existing DataFrame to create another one in the most efficient way containing information from rows two by two, i.e:

ID start_date end_date time
0 2022-01-02 2022-01-03 1 days
0 2022-01-04 2022-01-08 4 days
1 2022-03-02 2022-03-05 3 days
2 2022-04-06 2022-04-10 4 days

Assume that the original DataFrame is contained in df. I need a way to iterate over the rows two by two, being able to query in each iteration the information of the first row and its next one. I hope you can help me, thank you in advance.

CodePudding user response:

You can use df.loc df.shift to create final dataframe:

df_out = pd.DataFrame(
    {
        "ID": df.loc[::2, "ID"],
        "start_date": df.loc[::2, "date"],
        "end_date": df.shift(-1).loc[::2, "date"],
        "time": df.shift(-1).loc[::2, "time"],
    }
)
print(df_out)

Prints:

   ID  start_date    end_date    time
0   0  2022-01-02  2022-01-03  1 days
2   0  2022-01-04  2022-01-08  4 days
4   1  2022-03-02  2022-03-05  3 days
6   2  2022-04-06  2022-04-10  4 days

Or:

df_out = pd.DataFrame(
    {
        "ID": df.loc[::2, "ID"].values,
        "start_date": df.loc[::2, "date"].values,
        "end_date": df.loc[1::2, "date"].values,
        "time": df.loc[1::2, "time"].values,
    }
)
  • Related