Home > OS >  Pandas - 'returning-a-view-versus-a-copy' error when renaming categorical values at each l
Pandas - 'returning-a-view-versus-a-copy' error when renaming categorical values at each l

Time:11-22

I am getting an error:

<ipython-input-309-00fb859fe0ab>:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train_set['month'] = train_set['month'].map({1: 'jan', 2: 'feb', 3: 'mar', 4: 'apr',  5: 'may',

when trying to reassign variables of a categorical variable with numeric values. Nan's are also filling the column instead of the values I am mapping to the original values.

How would one go about this? I also tried using loc and iloc as data.loc[:,'month'] = data.loc[:,'month'].map() or data.month = data.month.map() instead of data['month'] = data['month'].map() but neither worked. I added an example using the bank merchant data from UCL to show my error.

Example

# import raw dataset from URL (same as data provided, just put in a git repository for ease)
DATA_URL = 'https://raw.githubusercontent.com/ThamuMnyulwa/bankMarketing/main/bank-additional-full.csv'
raw_dataset = pd.read_csv(DATA_URL, sep=';', skipinitialspace=True, index_col=None)

data = raw_dataset.copy()

# rename variables for masking
data['month'] = data['month'].map({1: 'jan', 2: 'feb', 3: 'mar', 4: 'apr',  5: 'may',
                                           6: 'jun', 7: 'jul', 8: 'aug' ,9: 'sep' ,10: 'oct',
                                          11: 'nov',12: 'dec' })
# print to see error
data['month']

enter image description here

The original data looks had the values.

>> np.unique(raw_dataset['month'])

array(['apr', 'aug', 'dec', 'jul', 'jun', 'mar', 'may', 'nov', 'oct',
       'sep'], dtype=object)

CodePudding user response:

No need to map.

dates and datetime are elaborately covered in python. Capitalise first letter in each month and then coerce it to datetime. code below

df['month'] =pd.to_datetime(data['month'].str.capitalize(), format='%b').dt.month

CodePudding user response:

Your dictionary should have months as keys(jan/feb) and numbers as values!.

data['month'] = data['month'].map({'jan': 1, 'feb': 2, 'mar': 3, 'apr': 4, 'may': 5, 'jun': 6, 'jul': 7, 'aug': 8, 'sep': 9, 'oct': 10, 'nov': 11, 'dec': 12})
  • Related