Home > Net >  Group by the data based on the one columns and then build the rows with the values in data frame
Group by the data based on the one columns and then build the rows with the values in data frame

Time:06-19

I have a data frame with multiple id in column id. For each day, I have 5 time steps. (6:00, 6:15, 6:30, 6:45, 7:00) However, some days does not have 5. And I want to fill the missing value as Nan.. Let see the following example,

import pandas as pd
df = pd.DataFrame()
df['id'] =   [1, 1, 1, 1, 1, 2, 2, 2,3, 3]
df['val'] = [11, 10, 12, 3, 4, 5, 125, 45,31, -2]
df['date'] = ['2019-03-31 06:00:00','2019-03-31 06:15:00', '2019-03-31 06:30:00', '2019-03-31 06:45:00', '2019-03-31 07:00:00', '2019-03-31 06:00:00', '2019-03-31 06:30:00',
              '2019-03-31 06:45:00', '2019-03-31 06:00:00', '2019-03-31 06:15:00']
df

For example, for id=1 we have 5 time steps.

For id=2, we have 3 time steps.

for id=3, we have 2 time steps.

So,

I want to sticks values in one rows and add only the day of the time to that row.

Here is the desired output for my data frame:

enter image description here

Can you help me with this? Thank you so much.

CodePudding user response:

One way using pandas.DataFrame.pivot:

df["dates"] = pd.to_datetime(df["date"]).dt.date
new_df = df.pivot(index=["id", "dates"], columns="date", values="val")
new_df.columns = [f"val{i 1}" for i in range(new_df.shape[1])]
new_df.reset_index()

Output:

   id       dates  val1  val2   val3  val4  val5
0   1  2019-03-31  11.0  10.0   12.0   3.0   4.0
1   2  2019-03-31   5.0   NaN  125.0  45.0   NaN
2   3  2019-03-31  31.0  -2.0    NaN   NaN   NaN
  • Related