I have extracted a group of data (e.g. col 'A') from a larger dataset and wanted to apply a function to the group in order to verify the results of the function. The Problem is, when I apply the function to a group that has only one distinct value (df_false), pandas returns a
ValueError: Expected a 1D array, got an array with shape (6, 6)
When I apply the same function to a df that has more than one distinct value in the grouping column (df_true), the error doesn't appear.
Does anyone know how to deal with that?
import pandas as pd
df_false = pd.DataFrame({'A' : ['a', 'a', 'a', 'a', 'a', 'a'],
'B': [10,10,20,20,30,10],
'C': [10,10,20,30,10,5]})
df_ok = pd.DataFrame({'A' : ['a', 'a', 'a', 'a', 'a', 'c'],
'B': [10,10,20,20,30,10],
'C': [10,10,20,30,10,5]})
display(df_false)
def myf(x):
y = []
for i in x.iterrows():
y.append(len(x))
return pd.Series(y)
df_false['result'] = df_false.groupby('A').apply(myf).reset_index(drop=True)
display(df)
CodePudding user response:
The issue is that your code with df_false
outputs a DataFrame (of a single row). You can force it into a Series with squeeze
:
df_false['result'] = (df_false.groupby('A').apply(myf)
.reset_index(drop=True).squeeze()
)
That said, unless this was a dummy example, you should rather use vectorial code:
df_false['result'] = df_false.groupby('A')['A'].transform('size')
output:
A B C result
0 a 10 10 6
1 a 10 10 6
2 a 20 20 6
3 a 20 30 6
4 a 30 10 6
5 a 10 5 6