Home > Blockchain >  IndexError: Index contains null values when adding dataframe to featuretools EntitySet
IndexError: Index contains null values when adding dataframe to featuretools EntitySet

Time:01-20

I have my dataframe which I want to add to EntitySet:

       Unnamed: 0    Year               name Pos   Age   Tm     G    GS  \
24672       24672  2017.0      Troy Williams  SF  22.0  TOT  30.0  16.0   
24675       24675  2017.0       Kyle Wiltjer  PF  24.0  HOU  14.0   0.0   
24688       24688  2017.0  Stephen Zimmerman   C  20.0  ORL  19.0   0.0   
24689       24689  2017.0        Paul Zipser  SF  22.0  CHI  44.0  18.0   
24690       24690  2017.0        Ivica Zubac   C  19.0  LAL  38.0  11.0   

          MP   PER  ...    FT%   ORB    DRB    TRB   AST   STL   BLK   TOV  \
24672  557.0   8.9  ...  0.656  15.0   54.0   69.0  25.0  27.0  10.0  33.0   
24675   44.0   6.7  ...  0.500   4.0    6.0   10.0   2.0   3.0   1.0   5.0   
24688  108.0   7.3  ...  0.600  11.0   24.0   35.0   4.0   2.0   5.0   3.0   
24689  843.0   6.9  ...  0.775  15.0  110.0  125.0  36.0  15.0  16.0  40.0   
24690  609.0  17.0  ...  0.653  41.0  118.0  159.0  30.0  14.0  33.0  30.0   

         PF    PTS  
24672  60.0  185.0  
24675   4.0   13.0  
24688  17.0   23.0  
24689  78.0  240.0  
24690  66.0  284.0  

When I try to add dataframe to featuretools EntitySet, like this:

entity_set.add_dataframe(dataframe_name="season_stats",
                 dataframe=season_stats,
                 index='name'
                 )

I receive such an error:

    /usr/local/lib/python3.8/dist-packages/woodwork/table_accessor.py in _check_index(dataframe, index)
   1694 
   1695         if dataframe[index].isnull().any():
-> 1696             raise IndexError("Index contains null values")
   1697 
   1698 

IndexError: Index contains null values

What I'm doing wrong?

CodePudding user response:

Columns used as index columns in Featuretools cannot contain missing (null) values and the values must be unique. Based on the error message you are seeing, it seems that the name column you are attempting to use as an index has null values.

You will need to drop these rows from your dataframe prior to adding the dataframe to the Featuretools EntitySet. You can drop all rows in your dataframe for which name is null with this:

season_stats = season_stats.dropna(subset="name")
  • Related