Home > Software engineering >  I have a pandas dataframe and I need to get the mode of a groupby object. It was working but it'
I have a pandas dataframe and I need to get the mode of a groupby object. It was working but it'

Time:04-23

I really hope someone can help with this, it was working but now it is suddenly not and I can't work out why. Consider the following:

def generate_mode(option):
    averages_text.delete('1.0', tk.END)
    if option == 'Large Airport':
        df = data.query('type_airport_large_airport == 1')
        mode = df.groupby('type_airport_large_airport')['frequency_mhz'].agg(pd.Series.mode)
        averages_text.insert('1.0', 'The mode for large airport is {}'.format(mode.iat[0]))
    
    elif option == 'Frequency':
        freq_data = data.query('frequency_mhz > 100')
        mode = freq_data['frequency_mhz'].mode()
        averages_text.insert('1.0', 'The mode for frequency over 100mhz is {}'.format(mode.iat[0]))

This line:

mode = df.groupby('type_airport_large_airport')['frequency_mhz'].agg(pd.Series.mode)

Throws an error:

ValueError: Must produce aggregated value

if I change the option to .agg(pd.Series.median) it works fine.

Can anyone see what could be happening?

CodePudding user response:

As pointed out by Boris, pd.Series.mode() always returns a pd.Series (which is kinda an indexed list). A workaround for this would be:

df.groupby('type_airport_large_airport')['frequency_mhz'].agg(lambda x: pd.Series.mode(x)[0])

to capture the first element of the returned values (first mode). When using this approach, it's important to keep in mind that PD.Series.mode() returns all elements when there's no mode (so we'll capture the first value anyways), and if there are two or more modes, it'll show only the first one.

  • Related