Home > other >  pandas apply function on groups gives error when group has only one distinct value
pandas apply function on groups gives error when group has only one distinct value

Time:08-18

I have extracted a group of data (e.g. col 'A') from a larger dataset and wanted to apply a function to the group in order to verify the results of the function. The Problem is, when I apply the function to a group that has only one distinct value (df_false), pandas returns a

ValueError: Expected a 1D array, got an array with shape (6, 6)

When I apply the same function to a df that has more than one distinct value in the grouping column (df_true), the error doesn't appear.

Does anyone know how to deal with that?

import pandas as pd

df_false = pd.DataFrame({'A' : ['a', 'a', 'a', 'a', 'a', 'a'], 
                         'B': [10,10,20,20,30,10], 
                         'C': [10,10,20,30,10,5]})
df_ok = pd.DataFrame({'A' : ['a', 'a', 'a', 'a', 'a', 'c'], 
                      'B': [10,10,20,20,30,10], 
                      'C': [10,10,20,30,10,5]})
display(df_false)

def myf(x):
    y = []
    for i in x.iterrows():
        y.append(len(x))
    return pd.Series(y)
    
df_false['result'] = df_false.groupby('A').apply(myf).reset_index(drop=True)
display(df)

CodePudding user response:

The issue is that your code with df_false outputs a DataFrame (of a single row). You can force it into a Series with squeeze:

df_false['result'] = (df_false.groupby('A').apply(myf)
                              .reset_index(drop=True).squeeze()
                      )

That said, unless this was a dummy example, you should rather use vectorial code:

df_false['result'] = df_false.groupby('A')['A'].transform('size')

output:

   A   B   C  result
0  a  10  10       6
1  a  10  10       6
2  a  20  20       6
3  a  20  30       6
4  a  30  10       6
5  a  10   5       6
  • Related