I have a data frame with the values of precipitations day per day. I would like to do a sort of resample, so instead of day per day the data is collected year per year and every year has a column that contains the number of times it rained more than a certain value.
Date | Precipitation |
---|---|
2000-01-01 | 1 |
2000-01-03 | 6 |
2000-01-03 | 5 |
2001-01-01 | 3 |
2001-01-02 | 1 |
2001-01-03 | 0 |
2002-01-01 | 10 |
2002-01-02 | 8 |
2002-01-03 | 12 |
what I want is to count every year how many times Precipitation > 2
Date | Count |
---|---|
2000 | 2 |
2001 | 1 |
2002 | 3 |
I tried using resample()
but with no results
CodePudding user response:
@Tatthew you can do this with GroupBy.apply:
df.groupby(df.Date.dt.year).apply(lambda df: df.Precipitation[df.Precipitation > 2].count())
CodePudding user response:
You can use this bit of code:
# convert "Precipitation" and "date" values to proper types
df['Precipitation'] = df['Precipitation'].astype(int)
df["date"] = pd.to_datetime(df["date"])
# find rows that have "Precipitation" > 2
df['Count']= df.apply(lambda x: x["Precipitation"] > 2, axis=1)
# group df by year and drop the "Precipitation" column
df.groupby(df['date'].dt.year).sum().drop(columns=['Precipitation'])