ValueError: Length mismatch: Expected axis has X elements, new values have Y elements-CodePudding

I try to fill missing value with the most appeared one in its group . Code :

f = lambda x: x.mode().iat[0] if x.notna().any() else np.nan
s = df.groupby('VehicleType')['FuelType'].transform(f)
df['FuelType']=df['FuelType'].fillna(s)

Error: ValueError: Length mismatch: Expected axis has 316879 elements, new values have 354369 elements

Data sample :

Possible solutions: I think that maybe the VehicleType data has missing values, therefore it gives an error .Because when I use another column that has no missing values, it works. But I have to use VehicleType for this task .

CodePudding user response：

This problem appears to have been fixed in newer versions of pandas. (Works without issue on 1.4.0). But for older versions of pandas...

The issue is caused by NaN values in your grouping column together with .transform. To get around this problem instead of grouping by the column name, group by the Series where you first .fillna() with some value that doesn't occur in that column. This will succeed in assiging the NaN 'VehicleType' rows with the modal value for 'FuelType' among those NaN rows.

I'll assign the result as a separate column below for illustration.

Sample data to reproduce problem

import pandas as pd
import numpy as np

df = pd.DataFrame({'VehicleType': ['a', 'b', 'c', 'a', np.NaN, np.NaN, np.NaN, 'a'],
                   'FuelType': ['Y', np.NaN, 'Y', 'X', 'Z', 'Z', 'Y', 'X']})
f = lambda x: x.mode().iat[0] if x.notna().any() else np.nan    

df.groupby('VehicleType')['FuelType'].transform(f)
#ValueError: Length mismatch: Expected axis has 5 elements, new values have 8 elements

Solution

df['FuelType_mode'] = (df.groupby(df['VehicleType'].fillna('SPECIAL_MISSING'))
                         ['FuelType'].transform(f))

print(df)
  VehicleType FuelType FuelType_mode
0           a        Y             X
1           b      NaN           NaN
2           c        Y             Y
3           a        X             X
4         NaN        Z             Z
5         NaN        Z             Z
6         NaN        Y             Z
7           a        X             X

With newer versions of pandas the dropna arg can be used to specify whether you want to ignore NaN rows entirely when you group, or if you want to consider them their own unique group. Depending upon your desired behavior you would do:

# Still assigns NAN Vehicle Typethe modal Fuel Type. 
# Same logic as above
df['FT3'] = df.groupby('VehicleType', dropna=False)['FuelType'].transform(f)

# NAN Vehicle Types get NAN Fuel
df['FT4'] = df.groupby('VehicleType')['FuelType'].transform(f)


  VehicleType FuelType FuelType_mode  FT3  FT4
0           a        Y             X    X    X
1           b      NaN           NaN  NaN  NaN
2           c        Y             Y    Y    Y
3           a        X             X    X    X
4         NaN        Z             Z    Z  NaN
5         NaN        Z             Z    Z  NaN
6         NaN        Y             Z    Z  NaN
7           a        X             X    X    X