how to group repeating dates in index of dataframe in single date without making any change to dataf-CodePudding

i generated timestemp index with 3 hours frequency and assigned it to dataframe that contained forecasted data of weather for next 10 days with 3 hours interval. due to 3 hours frequency date is repeating in index against each value but i want to make group of each date and its respective data i have tried groupby, but it entirely changed my dataframe values. is there any solution for this problem ?

df['Hours'] = pd.date_range(start= start_time , periods=81, freq='3h')
df['Days'] = df['Hours'].dt.day_name()
df.index = df.Hours
df = df.drop('Hours',1)
df = df.drop('Days',1)
##df.groupby(['Days', 'Hours'])['Days'].nunique()
df

                     MSL    TEMPERATURE DPT RH% PRECIPITATION   CLOUD COVER
Hours                       
2021-12-20 00:00:00 1019.4  7.9 -9.7    27.4    0.00    6.8
2021-12-20 03:00:00 1019.4  7.5 -9.9    27.9    0.00    8.9
2021-12-20 06:00:00 1018.3  6.7 -10.3   28.6    0.00    6.4
2021-12-20 09:00:00 1019.2  7.9 -9.0    29.1    0.00    0.8
2021-12-20 12:00:00 1018.6  14.5    -7.7    20.8    0.00    0.9
... ... ... ... ... ... ...
2021-12-29 12:00:00 1024.2  12.1    -1.4    38.9    0.00    82.8
2021-12-29 15:00:00 1021.5  14.2    -1.8    32.9    0.01    99.8
2021-12-29 18:00:00 1021.7  10.6    -1.3    43.7    0.00    99.9
2021-12-29 21:00:00 1023.7  8.7 -2.3    45.8    0.00    91.5
2021-12-30 00:00:00 1024.4  7.9 -2.4    48.1    0.00    59.7

CodePudding user response：

Pandas dataframe is a table structure, so only place you can merge some columns is index. merge in quotes, because it is visualized like "merged" by pandas.

I simplified your sample data to show how it will be visualized.

>>> df
               time    MSL   TEMPERATURE
2021-12-20T00:00:00 1019.4           7.9 
2021-12-20T03:00:00 1019.4           7.5 
2021-12-20T06:00:00 1018.3           6.7 
2021-12-20T09:00:00 1019.2           7.9 
2021-12-21T00:00:00 1019.2           7.9 
2021-12-21T03:00:00 1019.2           7.9 
2021-12-21T06:00:00 1019.2           7.9 
2021-12-21T09:00:00 1019.2           7.9 
2021-12-22T00:00:00 1019.2           7.9

Create a column to summarize.

>>> df['date'] = df.time.dt.date
>>> df['hour'] = df.time.dt.time
>>> df
               time    MSL   TEMPERATURE       date     hour
2021-12-20T00:00:00 1019.4           7.9 2021-12-20 00:00:00
2021-12-20T03:00:00 1019.4           7.5 2021-12-20 03:00:00
2021-12-20T06:00:00 1018.3           6.7 2021-12-20 06:00:00
2021-12-20T09:00:00 1019.2           7.9 2021-12-20 09:00:00
2021-12-21T00:00:00 1019.2           7.9 2021-12-21 00:00:00
2021-12-21T03:00:00 1019.2           7.9 2021-12-21 03:00:00
2021-12-21T06:00:00 1019.2           7.9 2021-12-21 06:00:00
2021-12-21T09:00:00 1019.2           7.9 2021-12-21 09:00:00
2021-12-22T00:00:00 1019.2           7.9 2021-12-22 00:00:00

Then, use set_index on those columns. This will create MultiIndex which will display data as grouped on the first index.

Please check https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html for more details.

>>> df.set_index(['date', 'hour'])
                        MSL   TEMPERATURE 
date       hour
2021-12-20 00:00:00  1019.4           7.9 
           03:00:00  1019.4           7.5 
           06:00:00  1018.3           6.7 
           09:00:00  1019.2           7.9 
2021-12-21 00:00:00  1019.2           7.9 
           03:00:00  1019.2           7.9 
           06:00:00  1019.2           7.9 
           09:00:00  1019.2           7.9 
2021-12-22 00:00:00  1019.2           7.9