Home > Software design >  Pandas: calculating mean value of multiple columns using datetime and Grouper removes columns or doe
Pandas: calculating mean value of multiple columns using datetime and Grouper removes columns or doe

Time:02-16

As part of a larger task, I want to calculate the monthly mean values for each specific station. This is already difficult to do, but I am getting close.

The dataframe has many columns, but ultimately I only use the following information:

        Date        Value        Station_Name
0       2006-01-03  18           2
1       2006-01-04  12           2
2       2006-01-05  11           2
3       2006-01-06  10           2
4       2006-01-09  22           2
...     ...     ...
3510    2006-12-23  47           45
3511    2006-12-24  46           45
3512    2006-12-26  35           45
3513    2006-12-27  35           45
3514    2006-12-30  28           45

I am running into two issues, using:

df.groupby(['Station_Name', pd.Grouper(freq='M')])['Value'].mean()

It results in something like:

Station_Name  Date
2             2003-01-31         29.448387
              2003-02-28         30.617857
              2003-03-31         28.758065
              2003-04-30         28.392593
              2003-05-31         30.318519
                                    ...    
45            2003-09-30         16.160000
              2003-10-31         18.906452
              2003-11-30         26.296667
              2003-12-31         30.306667
              2004-01-31         29.330000

Which I can't seem to use as a regular dataframe, and the datetime is messed up as it doesn't show the monthly mean but gives the last day back. Also the station name is a single index, and not for the whole column. Plus the mean value doesn't have a "column name" at all. This isn't a dataframe, but a pandas.core.series.Series. I can't convert this again because it's not correct, and using the .to_frame() method shows that it is still indeed a Dataframe. I don't get this part.

I found that in order to return a normal dataframe, to use

as_index = False

In the groupby method. But this results in the months not being shown:

df.groupby(['station_name', pd.Grouper(freq='M')], as_index = False)['Value'].mean()

Gives:

   Station_Name          Value
0             2      29.448387
1             2      30.617857
2             2      28.758065
3             2      28.392593
4             2      30.318519
... ... ...
142          45      16.160000
143          45      18.906452
144          45      26.296667
145          45      30.306667
146          45      29.330000

I can't just simply add the month later, as not every station has an observation in every month.

I've tried using other methods, such as

df.resample("M").mean()

But it doesn't seem possible to do this on multiple columns. It returns the mean value of everything.

Edit: This is ultimately what I would want.

   Station_Name      Date               Value
0             2   2003-01           29.448387
1             2   2003-02           30.617857
2             2   2003-03           28.758065
3             2   2003-04           28.392593
4             2   2003-05           30.318519
... ... ...
142          45   2003-08           16.160000
143          45   2003-09           18.906452
144          45   2003-10           26.296667
145          45   2003-11           30.306667
146          45   2003-12           29.330000

CodePudding user response:

ok , how baout this :

df = df.groupby(['Station_Name',df['Date'].dt.to_period('M')])['Value'].mean().reset_index()

outut:

>>
   Station_Name     Date  Value
0             2  2006-01   14.6
1            45  2006-12   38.2
  • Related