I want to evaluate a data set with precipitation data. The data is available as a csv file, which I have read in with pandas as dataframe. From this then follows the following table:
year month day value
0 1981 1 1 0.522592
1 1981 1 2 2.692495
2 1981 1 3 0.556698
3 1981 1 4 0.000000
4 1981 1 5 0.000000
... ... ... ... ...
43824 2100 12 27 0.000000
43825 2100 12 28 0.185120
43826 2100 12 29 10.252080
43827 2100 12 30 13.389290
43828 2100 12 31 3.523566
Now I want to convert the daily precipitation values into monthly precipitation values and that for each month (for this I would need the sum of each day of a month). For this I probably need a loop or something similar. However, I do not know how to proceed. Maybe via a conditional selection over 'year' and 'month'?! I would be very happy about feedback! :)
That´s what I tried now:
for i in range(len(dataframe)):
print(dataframe.loc[i, 'year'], dataframe.loc[i, 'month'])
CodePudding user response:
Have you tried groupby?
Df.groupby(['year', 'month'])['value'].agg('sum')
CodePudding user response:
I would start out by making a single column with the date:
df['date'] = pd.to_datetime(df[['year', 'month', 'day']])
From here you can make the date the index:
df.set_index('date', inplace=True)
# I'll drop the unneeded year, month, and day columns as well.
df = df[['value']]
My data now looks like:
value
date
1981-01-01 0.522592
1981-01-02 2.692495
1981-01-03 0.556698
1981-01-04 0.000000
1981-01-05 0.000000
From here, let's try resampling the data!
# let's doing a 2 day sum. To do monthly, you'd replace '2d' with 'M'.
df.resample('2d').sum()
Output:
value
date
1981-01-01 3.215087
1981-01-03 0.556698
1981-01-05 0.000000
Hopefully this gives you something to start with~