I want to create a dataframe having specific inputs, while executing the code getting the following error.
Let me explain the sequence:
Checking the columns in
train_df
.Code:
train_df.columns
Output:
Index(['fare_amount', 'pickup_datetime', 'pickup_longitude', 'pickup_latitude', 'dropoff_longitude', 'dropoff_latitude', 'passenger_count', 'pickup_datetime_year', 'pickup_datetime_month', 'pickup_datetime_day', 'pickup_datetime_weekday', 'pickup_datetime_hour', 'trip_distance', 'jkf_drop_distance', 'lga_drop_distance', 'ewr_drop_distance', 'met_drop_distance', 'wtc_drop_distance'], dtype='object')
Selecting only the input columns required by for model.
Code:
input_cols = ['pickup_longitude', 'pickup_latitude', 'dropoff_longitude', 'dropoff_latitude', 'passenger_count', 'pickup_datetime_year', 'pickup_datetime_month', pickup_datetime_day', 'pickup_datetime_weekday', 'pickup_datetime_hour', 'trip_distance', 'jfk_drop_distance', 'lga_drop_distance', 'ewr_drop_distance', 'met_drop_distance', 'wtc_drop_distance']
Creation of training dataframe from the above specific columns.
Code:
train_inputs = train_df[input_cols]
I'm getting the error in the 3rd step the traceback is:
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-111-7f39184b2836> in <module> ----> 1 train_inputs = train_df[input_cols] ~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key) 3462 if is_iterator(key): 3463 key = list(key) -> 3464 indexer = self.loc._get_listlike_indexer(key, axis=1)[1] 3465 3466 # take() does not accept boolean indexers ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis) 1312 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr) 1313 -> 1314 self._validate_read_indexer(keyarr, indexer, axis) 1315 1316 if needs_i8_conversion(ax.dtype) or isinstance( ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis) 1375 1376 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique()) -> 1377 raise KeyError(f"{not_found} not in index") 1378 1379 KeyError: "['jfk_drop_distance'] not in index"
CodePudding user response:
You need to ensure that the items in the input_cols are all in train_df.columns, none of these items meet the conditions: ['fare_amount', 'pickup_datetime', 'jkf_drop_distance']
CodePudding user response:
These 3 columns from you input_cols don't exist (hence why you're getting that error):
'fare_amount'
'jkf_drop_distance'
'dropoff_latitude'