I have 4 columns such as the ones below (but a total of 50,000 values not shown here):
Date Speed Area Incidents
1/1/2016 6.5 Maly L
1/2/2016 7.7 Maly H
1/3/2016 14 Maly H
...
1/1/2017 5.5 Reet M
1/12/2017 9.8 Reet M
4/8/2017 3 Reet H
3/12/2017 5.8 Anlow L
I need to try and find the average speed recorded in all of 2017 in the area of Reet where the incidents were M. So the output for this is supposed to be: 7.65.
I have absolutely no idea what I'm missing, and so far I have tried using date_range() and set_index with .describe() for my average requirement but I couldn't get it right.
CodePudding user response:
Convert Date
to datetimes and aggregate year with another columns for sum:
df['Date'] = pd.to_datetime(df['Date'])
df1 = (df.groupby([df['Date'].dt.year.rename('Year'), 'Area','Incidents']).Speed
.sum()
.reset_index())
CodePudding user response:
You can use masks
to select the data you want, and then take the mean:
import pandas as pd
data = {'Date' : ['1/1/2016', '1/2/2016', '1/3/2016', '1/1/2017', '1/12/2017', '4/8/2017', '3/12/2017'],
'Speed' : [6.5, 7.7, 14, 5.5, 9.8, 3, 5.8],
'Area' : ['Maly', 'Maly', 'Maly', 'Reet', 'Reet', 'Reet', 'Anlow'],
'Incidents':['L', 'H', 'H', 'M', 'M', 'H', 'L']}
df = pd.DataFrame(data)
# Convert the 'date' to datetime and set it as the index
df['Date'] = pd.to_datetime(df['Date'])
df.index = df['Date']
# Create the mask(s) you want and take the mean of the chosen column
mean_wind_speed = df[(df.index.year==2017) & (df.Area=='Reet') & (df.Incidents == 'M')]['Speed'].mean()
print(mean_wind_speed)
outputs:
7.65
Edit
one can also use query
, with a new column for Years:
df['Year'] = df.index.year
mean_wind_speed = df.query('Year == 2017 and Area == "Reet" and Incidents=="M"')['Speed'].mean()