Home > Back-end >  Group by the data based on the two column (id and date) and then build the rows with the values in d
Group by the data based on the two column (id and date) and then build the rows with the values in d

Time:06-21

I have a data frame with multiple id in column id. For each day, I have 5 time steps. (6:00, 6:15, 6:30, 6:45, 7:00) However, some days does not have 5. And I want to fill the missing value as Nan.. Let see the following example,

import pandas as pd
df = pd.DataFrame()
df['id'] =   [1, 1, 1, 1, 1, 2, 2, 2,3, 3, 1, 1]
df['val'] = [11, 10, 12, 3, 4, 5, 125, 45,31, -2,5,6]
df['date'] = ['2019-03-31 06:00:00','2019-03-31 06:15:00', '2019-03-31 06:30:00', '2019-03-31 06:45:00', '2019-03-31 07:00:00', '2019-03-31 06:00:00', '2019-03-31 06:30:00',
              '2019-03-31 06:45:00', '2019-03-31 06:00:00', '2019-03-31 06:15:00', '2019-04-1 06:00:00', '2019-04-1 06:15:00']

For example, for id=1 we have 5 time steps at time 2019-03-31 and two value for the 2019-04-01.

For id=2, we have 3 time steps.

for id=3, we have 2 time steps.

So,

I want to sticks values in one rows and add only the day of the time to that row. My final df is as follow:

enter image description here

Now, I am using the following code which stick all the values to each other and create 7 columns. But I want 5 columns.

df["dates"] = pd.to_datetime(df["date"]).dt.date
new_df = df.pivot(index=["id", "dates"], columns="date", values="val")
new_df.columns = [f"val{i 1}" for i in range(new_df.shape[1])]
new_df.reset_index() 

Can you help me with that?

CodePudding user response:

create columns based on the time, I just added one line and changed the pivot to include time, in your code.

you were grouping by 'date', that has both date and time and hence you end up with 7 columns.

df["dates"] = pd.to_datetime(df["date"]).dt.date
df['time'] = pd.to_datetime(df["date"]).dt.time

new_df = df.pivot(index=["id", "dates"], columns="time", values="val")
new_df.columns = [f"val{i 1}" for i in range(new_df.shape[1])]
new_df.reset_index() 

    id  dates       val1    val2    val3    val4    val5
0   1   2019-03-31  11.0    10.0    12.0    3.0     4.0
1   1   2019-04-01  5.0     6.0     NaN     NaN     NaN
2   2   2019-03-31  5.0     NaN     125.0   45.0    NaN
3   3   2019-03-31  31.0    -2.0    NaN     NaN     NaN
  • Related