I did check for possible solutions, but the most common solutions didn't work.
df_woningen.groupby(['postcode'], dropna=True)['energy_ranking'].agg(pd.Series.mode)
Gives me multiple arrays in this format:
2611BA []
2611BB 4.0
2611BC [3.0, 6.0]
QUESTION: How to select the last item to use as value for a new column?
Background: one column has rankings. Per group I want to take the mode() and put it as imputed value for NaN's in that group.
In case of multiple modes I want to take the highest. Sometimes a group has only NaN, in that case it should or could stay like that. If a group has 8 NaN's and 1 ranking '8', that de mode should be 8, disregarding the NaN's.
I am trying to create a new column by using codes like this:
df_woningen.groupby(['postcode'], dropna=True)['energy_ranking'].agg(
lambda x: pd.Series.mode(x)[0])
Or
df_woningen.groupby(['postcode'], dropna=True)['energy_ranking'].agg(lambda x:x.value_counts(dropna=True).index[0])
But I get errors and I believe it's because of the different lengths of the arrays.
TypeError: 'function' object is not subscriptable
index 0 is out of bounds for axis 0 with size 0
Anyone an idea how to solve this?
CodePudding user response:
Assuming this example:
df = pd.DataFrame({'group': list('AAABBC'), 'value': [1,1,2,1,2,float('nan')]})
s = df.groupby('group')['value'].agg(pd.Series.mode)
Input:
group
A 1.0
B [1.0, 2.0]
C []
Name: value, dtype: object
You can use the str
accessor and fillna
:
s.str[-1].fillna(s.mask(s.str.len().eq(0)))
# or for numbers
# s.str[-1].fillna(pd.to_numeric(s, errors='coerce'))
Output:
group
A 1.0
B 2.0
C NaN
Name: value, dtype: float64
CodePudding user response:
IIUC you can use use a lambda function in conjunction with a -1 for a list to display the data you are looking for
data = {
'Column1' : ['2611BA', '2611BB', '2611BC'],
'Column2' : [[], [4.0], [3.0, 6.0]]
}
df = pd.DataFrame(data)
df['Column3'] = df['Column2'].apply(lambda x : x[-1] if len(x) > 0 else '')
df