I am getting an error:
<ipython-input-309-00fb859fe0ab>:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
train_set['month'] = train_set['month'].map({1: 'jan', 2: 'feb', 3: 'mar', 4: 'apr', 5: 'may',
when trying to reassign variables of a categorical variable with numeric values. Nan's are also filling the column instead of the values I am mapping to the original values.
How would one go about this? I also tried using loc
and iloc
as data.loc[:,'month'] = data.loc[:,'month'].map()
or data.month = data.month.map()
instead of data['month'] = data['month'].map()
but neither worked. I added an example using the bank merchant data from UCL to show my error.
Example
# import raw dataset from URL (same as data provided, just put in a git repository for ease)
DATA_URL = 'https://raw.githubusercontent.com/ThamuMnyulwa/bankMarketing/main/bank-additional-full.csv'
raw_dataset = pd.read_csv(DATA_URL, sep=';', skipinitialspace=True, index_col=None)
data = raw_dataset.copy()
# rename variables for masking
data['month'] = data['month'].map({1: 'jan', 2: 'feb', 3: 'mar', 4: 'apr', 5: 'may',
6: 'jun', 7: 'jul', 8: 'aug' ,9: 'sep' ,10: 'oct',
11: 'nov',12: 'dec' })
# print to see error
data['month']
The original data looks had the values.
>> np.unique(raw_dataset['month'])
array(['apr', 'aug', 'dec', 'jul', 'jun', 'mar', 'may', 'nov', 'oct',
'sep'], dtype=object)
CodePudding user response:
No need to map.
dates and datetime are elaborately covered in python. Capitalise first letter in each month and then coerce it to datetime. code below
df['month'] =pd.to_datetime(data['month'].str.capitalize(), format='%b').dt.month
CodePudding user response:
Your dictionary should have months as keys(jan
/feb
) and numbers as values!.
data['month'] = data['month'].map({'jan': 1, 'feb': 2, 'mar': 3, 'apr': 4, 'may': 5, 'jun': 6, 'jul': 7, 'aug': 8, 'sep': 9, 'oct': 10, 'nov': 11, 'dec': 12})