What I need is to group dataset only by months, so in the end the would be only 12 rows with data aggregated for each month. Unfortunately, when I try to use pd.Grouper it doesn't work (well, it changes days to 31, but it's not what I need). Maybe I should try using regex and delete years names and then try? Still unsure. Anyway, I hope that you'll help me.. P.S. Does Grouper works only with date as index? Without setting date to index, even if date in datetime format it just doesn't work..
ds = pd.read_csv('https://raw.githubusercontent.com/Lotaristo/Datasets/main/Projects/Homework/homework.csv')
ds.date = pd.to_datetime(ds.date)
ds.set_index('date', inplace=True)
ds.groupby(pd.Grouper(freq='M')).mean()
CodePudding user response:
One very simple solution may be to create another column month
as a substring of date
, and then group on this new column.
ds = pd.read_csv('https://raw.githubusercontent.com/Lotaristo/Datasets/main/Projects/Homework/homework.csv')
ds['month'] = ds.date.str[5:7]
aggr = ds.groupby('month').mean()
CodePudding user response:
You can use .month
to access months of the index:
out = ds.groupby(ds.index.month).mean()
Output:
гдз по русскому языку 5 класс гдз по английскому языку 5 класс ...
date
1 27.200000 28.733333 ...
2 34.466667 34.000000 ...
3 26.200000 27.000000 ...
4 36.600000 36.666667 ...
5 20.133333 20.866667 ...
6 0.266667 0.533333 ...
7 0.066667 0.266667 ...
8 0.000000 0.533333 ...
9 33.400000 30.733333 ...
10 29.466667 32.666667 ...
11 31.000000 31.800000 ...
12 29.600000 32.733333 ...
[12 rows x 69 columns]