Home > Software design >  how to group repeating dates in index of dataframe in single date without making any change to dataf
how to group repeating dates in index of dataframe in single date without making any change to dataf

Time:12-22

i generated timestemp index with 3 hours frequency and assigned it to dataframe that contained forecasted data of weather for next 10 days with 3 hours interval. due to 3 hours frequency date is repeating in index against each value but i want to make group of each date and its respective data i have tried groupby, but it entirely changed my dataframe values. is there any solution for this problem ?

df['Hours'] = pd.date_range(start= start_time , periods=81, freq='3h')
df['Days'] = df['Hours'].dt.day_name()
df.index = df.Hours
df = df.drop('Hours',1)
df = df.drop('Days',1)
##df.groupby(['Days', 'Hours'])['Days'].nunique()
df
                     MSL    TEMPERATURE DPT RH% PRECIPITATION   CLOUD COVER
Hours                       
2021-12-20 00:00:00 1019.4  7.9 -9.7    27.4    0.00    6.8
2021-12-20 03:00:00 1019.4  7.5 -9.9    27.9    0.00    8.9
2021-12-20 06:00:00 1018.3  6.7 -10.3   28.6    0.00    6.4
2021-12-20 09:00:00 1019.2  7.9 -9.0    29.1    0.00    0.8
2021-12-20 12:00:00 1018.6  14.5    -7.7    20.8    0.00    0.9
... ... ... ... ... ... ...
2021-12-29 12:00:00 1024.2  12.1    -1.4    38.9    0.00    82.8
2021-12-29 15:00:00 1021.5  14.2    -1.8    32.9    0.01    99.8
2021-12-29 18:00:00 1021.7  10.6    -1.3    43.7    0.00    99.9
2021-12-29 21:00:00 1023.7  8.7 -2.3    45.8    0.00    91.5
2021-12-30 00:00:00 1024.4  7.9 -2.4    48.1    0.00    59.7

CodePudding user response:

Pandas dataframe is a table structure, so only place you can merge some columns is index. merge in quotes, because it is visualized like "merged" by pandas.

I simplified your sample data to show how it will be visualized.

>>> df
               time    MSL   TEMPERATURE
2021-12-20T00:00:00 1019.4           7.9 
2021-12-20T03:00:00 1019.4           7.5 
2021-12-20T06:00:00 1018.3           6.7 
2021-12-20T09:00:00 1019.2           7.9 
2021-12-21T00:00:00 1019.2           7.9 
2021-12-21T03:00:00 1019.2           7.9 
2021-12-21T06:00:00 1019.2           7.9 
2021-12-21T09:00:00 1019.2           7.9 
2021-12-22T00:00:00 1019.2           7.9 

Create a column to summarize.

>>> df['date'] = df.time.dt.date
>>> df['hour'] = df.time.dt.time
>>> df
               time    MSL   TEMPERATURE       date     hour
2021-12-20T00:00:00 1019.4           7.9 2021-12-20 00:00:00
2021-12-20T03:00:00 1019.4           7.5 2021-12-20 03:00:00
2021-12-20T06:00:00 1018.3           6.7 2021-12-20 06:00:00
2021-12-20T09:00:00 1019.2           7.9 2021-12-20 09:00:00
2021-12-21T00:00:00 1019.2           7.9 2021-12-21 00:00:00
2021-12-21T03:00:00 1019.2           7.9 2021-12-21 03:00:00
2021-12-21T06:00:00 1019.2           7.9 2021-12-21 06:00:00
2021-12-21T09:00:00 1019.2           7.9 2021-12-21 09:00:00
2021-12-22T00:00:00 1019.2           7.9 2021-12-22 00:00:00

Then, use set_index on those columns. This will create MultiIndex which will display data as grouped on the first index.

Please check https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html for more details.

>>> df.set_index(['date', 'hour'])
                        MSL   TEMPERATURE 
date       hour
2021-12-20 00:00:00  1019.4           7.9 
           03:00:00  1019.4           7.5 
           06:00:00  1018.3           6.7 
           09:00:00  1019.2           7.9 
2021-12-21 00:00:00  1019.2           7.9 
           03:00:00  1019.2           7.9 
           06:00:00  1019.2           7.9 
           09:00:00  1019.2           7.9 
2021-12-22 00:00:00  1019.2           7.9 
  • Related