Getting the monthly average for each year and avoid negatives values-CodePudding

I have the following albedo data; and I am trying to get the average value for each month in a single year.

Intput data:

           date  blue_sky_albedo
0    2000-02-24        -9999.000
1    2000-02-25        -9999.000
2    2000-02-26        -9999.000
3    2000-02-27        -9999.000
4    2000-02-28            0.221
...         ...              ...
7866 2021-09-10            0.265
7867 2021-09-11            0.264
7868 2021-09-12            0.264
7869 2021-09-13            0.264
7870 2021-09-14            0.265

I am creating an Excel file for each year , and I am avoiding negatives value due to there is not this information on that day. (Maybe remplacing with a NaN?)

My Code:

file = pd.read_csv('file.csv', 
                    sep = ';', 
                    skiprows = 16,
                    parse_dates = ['date'])

# %% 
#* Create an excel file time (15 min) for each year.
for year_XX in range(pd.to_datetime(file['date']).dt.year.min(), 
                     pd.to_datetime(file['date']).dt.year.max() 1):
    data_by_whole_year = file[pd.to_datetime(file['date']).dt.year == year_XX]
    data_by_whole_year.groupby(pd.PeriodIndex(data_by_whole_year['date'], freq = "M"))['blue_sky_albedo'].mean().reset_index()
    print('Creating file (Month Average) for the year: '  str(year_XX))
    print(data_by_whole_year)

However, my code is printing the fraction year without doing the average, which I want. Where is my mistake ?

Results:

Creating file (Month Average) for the year: 2000
          date  blue_sky_albedo
0   2000-02-24        -9999.000
1   2000-02-25        -9999.000
2   2000-02-26        -9999.000
3   2000-02-27        -9999.000
4   2000-02-28            0.221
..         ...              ...
307 2000-12-27            0.250
308 2000-12-28            0.251
309 2000-12-29            0.251
310 2000-12-30            0.250
311 2000-12-31            0.252

CodePudding user response：

You aren't assigning the result of your groupby to anything.

Separately, you could also get what you need using resample after filtering on positive values only:

#convert to datetime if needed
df["date"] = pd.to_datetime(df["date"])

#get monthly averages of only positive values
monthly = df.where(df["blue_sky_albedo"]>0).dropna().resample("M", on="date").mean()

#generate yearly files
for year in monthly.resample("Y").last().index.year:
    print(f"Creating file (Month Average) for the year: {year}")
    monthly[monthly.index.year==year].to_csv(f"data_{year}.csv")