I have a csv dataset where I have a column name "Types of Incidents" and another column named "Number of units".
Using Python and Pandas I am trying to find the average of "Number of units" when the value in type of incidents is 111. (It is found multiple times).
I have tried searching for multiple pandas methods but couldn't find how to find it on a huge dataset.
Here is the question:
What is the ratio of the average number of units that arrive to a scene of an incident classified as '111 - Building fire' to the number that arrive for '651 - Smoke scare, odor of smoke'?
CodePudding user response:
An alternate to ML-Nielsen's value specific answer:
df.groupby('Types of Incidents')['Number of units'].mean()
This will provide the average Number of units
for all Incident Types
.
You can specify multiple columns as well if needed.
Reproducible Example:
data = {
"Incident_Type": [111, 380, 390, 111, 651, 651],
"Number_of_units": [50, 40, 45, 99, 12, 13]
}
data = pd.DataFrame(data)
data
Incident_Type Number_of_units
0 111 50
1 380 40
2 390 45
3 111 99
4 651 12
5 651 13
data.groupby('Incident_Type')['Number_of_units'].mean()
Incident_Type
111 74.5
380 40.0
390 45.0
651 12.5
Name: Number_of_units, dtype: float64
Now if you wish to find the ratios of the units you will need to store this result as a dataframe.
average_units = data.groupby('Incident_Type')['Number_of_units'].mean().to_frame()
average_units = average_units.reset_index()
average_units
Incident_Type Number_of_units
0 111 74.5
1 380 40.0
2 390 45.0
3 651 12.5
So we have our result stored in a dataframe called average_units
.
incident1_units = average_units[average_units['Incident_Type']==111]['Number_of_units'].values[0]
incident2_units = average_units[average_units['Incident_Type']==651]['Number_of_units'].values[0]
incident1_units / incident2_units
5.96
CodePudding user response:
If I understand correctly, you probably have to first select the right rows and then calculate the mean. Something like this:
df.loc[df['Types of Incidents']==111, 'Number of units'].mean()
This will give you the mean of Number of units
where the condition df['Types of Incidents']==111
is true.