I am calculating the mode/median/mean of pandas df columns using .mean(), .median(), .mode() but when doing so an index appears in some of the results:
def largeStats(dataframe):
dataframe.drop(dataframe.index[dataframe['large_airport'] != 'Y'], inplace=True)
mean = dataframe['frequency_mhz'].mean()
mode = dataframe['frequency_mhz'].mode()
median = dataframe['frequency_mhz'].median()
print("The mean freq of large airports is", mean)
print("The most common freq of large airports is", mode)
print("The middle freq of large airports is", median)
print(largeStats(df))
returns:
The mean freq of large airports is 120.00752293577986
The most common freq of large airports is 0 121.75
1 122.10
dtype: float64
The middle freq of large airports is 121.85
None
I want it to simply return the number for each:
The mean freq of large airports is 120.00752293577986
The most common freq of large airports is 121.75 & 122.10
The middle freq of large airports is 121.85
I know the indexing is in place due to 2 mode values but how would I remove that indexing?
CodePudding user response:
This would fix it,
mode = dataframe['frequency_mhz'].mode().values[0]
The mode()
function gives back a pandas series. So this would allow you to access the item in that series.
CodePudding user response:
You can turn a pandas into a numpy array using the .values
property:
mode = dataframe['frequency_mhz'].mode().values
should give you what you want.
CodePudding user response:
Because Series.mode
can return one or more values, need filter first value for scalar:
The mode is the value that appears most often. There can be multiple modes.
print("The most common freq of large airports is", mode.iat[0])