Home > Back-end >  Python groupby mode() pick last item when having empty, single and multiple array lengths
Python groupby mode() pick last item when having empty, single and multiple array lengths

Time:05-21

I did check for possible solutions, but the most common solutions didn't work.

df_woningen.groupby(['postcode'], dropna=True)['energy_ranking'].agg(pd.Series.mode)

Gives me multiple arrays in this format:

2611BA            []
2611BB           4.0
2611BC    [3.0, 6.0]

QUESTION: How to select the last item to use as value for a new column?

Background: one column has rankings. Per group I want to take the mode() and put it as imputed value for NaN's in that group.

In case of multiple modes I want to take the highest. Sometimes a group has only NaN, in that case it should or could stay like that. If a group has 8 NaN's and 1 ranking '8', that de mode should be 8, disregarding the NaN's.

I am trying to create a new column by using codes like this:

df_woningen.groupby(['postcode'], dropna=True)['energy_ranking'].agg(
    lambda x: pd.Series.mode(x)[0])

Or

df_woningen.groupby(['postcode'], dropna=True)['energy_ranking'].agg(lambda x:x.value_counts(dropna=True).index[0])

But I get errors and I believe it's because of the different lengths of the arrays.

TypeError: 'function' object is not subscriptable

index 0 is out of bounds for axis 0 with size 0

Anyone an idea how to solve this?

CodePudding user response:

Assuming this example:

df = pd.DataFrame({'group': list('AAABBC'), 'value': [1,1,2,1,2,float('nan')]})
s = df.groupby('group')['value'].agg(pd.Series.mode)

Input:

group
A           1.0
B    [1.0, 2.0]
C            []
Name: value, dtype: object

You can use the str accessor and fillna:

s.str[-1].fillna(s.mask(s.str.len().eq(0)))
# or for numbers
# s.str[-1].fillna(pd.to_numeric(s, errors='coerce'))

Output:

group
A    1.0
B    2.0
C    NaN
Name: value, dtype: float64

CodePudding user response:

IIUC you can use use a lambda function in conjunction with a -1 for a list to display the data you are looking for

data = {
    'Column1' : ['2611BA', '2611BB', '2611BC'],
    'Column2' : [[], [4.0], [3.0, 6.0]]
}
df = pd.DataFrame(data)
df['Column3'] = df['Column2'].apply(lambda x : x[-1] if len(x) > 0 else '')
df
  • Related