Find the average of pandas dataframe column(s) for multiple or all rows-CodePudding

I have a csv dataset where I have a column name "Types of Incidents" and another column named "Number of units".

Using Python and Pandas I am trying to find the average of "Number of units" when the value in type of incidents is 111. (It is found multiple times).

I have tried searching for multiple pandas methods but couldn't find how to find it on a huge dataset.

Here is the question:

What is the ratio of the average number of units that arrive to a scene of an incident classified as '111 - Building fire' to the number that arrive for '651 - Smoke scare, odor of smoke'?

CodePudding user response：

An alternate to ML-Nielsen's value specific answer:

df.groupby('Types of Incidents')['Number of units'].mean()

This will provide the average Number of units for all Incident Types. You can specify multiple columns as well if needed.

Reproducible Example:

data = {
  "Incident_Type": [111, 380, 390, 111, 651, 651],
  "Number_of_units": [50, 40, 45, 99, 12, 13]
}
data = pd.DataFrame(data)
data

   Incident_Type  Number_of_units
0            111               50
1            380               40
2            390               45
3            111               99
4            651               12
5            651               13

data.groupby('Incident_Type')['Number_of_units'].mean()

Incident_Type
111    74.5
380    40.0
390    45.0
651    12.5
Name: Number_of_units, dtype: float64

Now if you wish to find the ratios of the units you will need to store this result as a dataframe.

average_units = data.groupby('Incident_Type')['Number_of_units'].mean().to_frame()
average_units = average_units.reset_index() 
average_units
   Incident_Type  Number_of_units
0            111             74.5
1            380             40.0
2            390             45.0
3            651             12.5

So we have our result stored in a dataframe called average_units.

incident1_units = average_units[average_units['Incident_Type']==111]['Number_of_units'].values[0]
incident2_units = average_units[average_units['Incident_Type']==651]['Number_of_units'].values[0]
incident1_units / incident2_units
5.96

CodePudding user response：

If I understand correctly, you probably have to first select the right rows and then calculate the mean. Something like this:

df.loc[df['Types of Incidents']==111, 'Number of units'].mean()

This will give you the mean of Number of units where the condition df['Types of Incidents']==111 is true.