Evaluation of a data set with conditional selection of columns-CodePudding

I want to evaluate a data set with precipitation data. The data is available as a csv file, which I have read in with pandas as dataframe. From this then follows the following table:

year  month  day      value
0      1981      1    1   0.522592
1      1981      1    2   2.692495
2      1981      1    3   0.556698
3      1981      1    4   0.000000
4      1981      1    5   0.000000
...     ...    ...  ...        ...
43824  2100     12   27   0.000000
43825  2100     12   28   0.185120
43826  2100     12   29  10.252080
43827  2100     12   30  13.389290
43828  2100     12   31   3.523566

Now I want to convert the daily precipitation values into monthly precipitation values and that for each month (for this I would need the sum of each day of a month). For this I probably need a loop or something similar. However, I do not know how to proceed. Maybe via a conditional selection over 'year' and 'month'?! I would be very happy about feedback! :)

That´s what I tried now:

for i in range(len(dataframe)):
    print(dataframe.loc[i, 'year'], dataframe.loc[i, 'month'])

CodePudding user response：

Have you tried groupby?

Df.groupby(['year', 'month'])['value'].agg('sum')

CodePudding user response：

I would start out by making a single column with the date:

df['date'] = pd.to_datetime(df[['year', 'month', 'day']])

From here you can make the date the index:

df.set_index('date', inplace=True)
# I'll drop the unneeded year, month, and day columns as well.
df = df[['value']]

My data now looks like:

               value
date
1981-01-01  0.522592
1981-01-02  2.692495
1981-01-03  0.556698
1981-01-04  0.000000
1981-01-05  0.000000

From here, let's try resampling the data!

# let's doing a 2 day sum. To do monthly, you'd replace '2d' with 'M'.
df.resample('2d').sum()

Output:

               value
date
1981-01-01  3.215087
1981-01-03  0.556698
1981-01-05  0.000000

Hopefully this gives you something to start with~