i generated timestemp index with 3 hours frequency and assigned it to dataframe that contained forecasted data of weather for next 10 days with 3 hours interval. due to 3 hours frequency date is repeating in index against each value but i want to make group of each date and its respective data i have tried groupby, but it entirely changed my dataframe values. is there any solution for this problem ?
df['Hours'] = pd.date_range(start= start_time , periods=81, freq='3h')
df['Days'] = df['Hours'].dt.day_name()
df.index = df.Hours
df = df.drop('Hours',1)
df = df.drop('Days',1)
##df.groupby(['Days', 'Hours'])['Days'].nunique()
df
MSL TEMPERATURE DPT RH% PRECIPITATION CLOUD COVER
Hours
2021-12-20 00:00:00 1019.4 7.9 -9.7 27.4 0.00 6.8
2021-12-20 03:00:00 1019.4 7.5 -9.9 27.9 0.00 8.9
2021-12-20 06:00:00 1018.3 6.7 -10.3 28.6 0.00 6.4
2021-12-20 09:00:00 1019.2 7.9 -9.0 29.1 0.00 0.8
2021-12-20 12:00:00 1018.6 14.5 -7.7 20.8 0.00 0.9
... ... ... ... ... ... ...
2021-12-29 12:00:00 1024.2 12.1 -1.4 38.9 0.00 82.8
2021-12-29 15:00:00 1021.5 14.2 -1.8 32.9 0.01 99.8
2021-12-29 18:00:00 1021.7 10.6 -1.3 43.7 0.00 99.9
2021-12-29 21:00:00 1023.7 8.7 -2.3 45.8 0.00 91.5
2021-12-30 00:00:00 1024.4 7.9 -2.4 48.1 0.00 59.7
CodePudding user response:
Pandas dataframe is a table structure, so only place you can merge some columns is index
. merge in quotes, because it is visualized like "merged" by pandas.
I simplified your sample data to show how it will be visualized.
>>> df
time MSL TEMPERATURE
2021-12-20T00:00:00 1019.4 7.9
2021-12-20T03:00:00 1019.4 7.5
2021-12-20T06:00:00 1018.3 6.7
2021-12-20T09:00:00 1019.2 7.9
2021-12-21T00:00:00 1019.2 7.9
2021-12-21T03:00:00 1019.2 7.9
2021-12-21T06:00:00 1019.2 7.9
2021-12-21T09:00:00 1019.2 7.9
2021-12-22T00:00:00 1019.2 7.9
Create a column to summarize.
>>> df['date'] = df.time.dt.date
>>> df['hour'] = df.time.dt.time
>>> df
time MSL TEMPERATURE date hour
2021-12-20T00:00:00 1019.4 7.9 2021-12-20 00:00:00
2021-12-20T03:00:00 1019.4 7.5 2021-12-20 03:00:00
2021-12-20T06:00:00 1018.3 6.7 2021-12-20 06:00:00
2021-12-20T09:00:00 1019.2 7.9 2021-12-20 09:00:00
2021-12-21T00:00:00 1019.2 7.9 2021-12-21 00:00:00
2021-12-21T03:00:00 1019.2 7.9 2021-12-21 03:00:00
2021-12-21T06:00:00 1019.2 7.9 2021-12-21 06:00:00
2021-12-21T09:00:00 1019.2 7.9 2021-12-21 09:00:00
2021-12-22T00:00:00 1019.2 7.9 2021-12-22 00:00:00
Then, use set_index
on those columns. This will create MultiIndex which will display data as grouped on the first index.
Please check https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html for more details.
>>> df.set_index(['date', 'hour'])
MSL TEMPERATURE
date hour
2021-12-20 00:00:00 1019.4 7.9
03:00:00 1019.4 7.5
06:00:00 1018.3 6.7
09:00:00 1019.2 7.9
2021-12-21 00:00:00 1019.2 7.9
03:00:00 1019.2 7.9
06:00:00 1019.2 7.9
09:00:00 1019.2 7.9
2021-12-22 00:00:00 1019.2 7.9