originally this was the dtype of my df:
All Markets object
Category object
Brand object
Target Age object
Segment object
Sub_brand object
Form object
Week object
Sales float64
Units float64
TDP float64
No_of_Stores float64
dtype: object
The "Week" column originally looked like this:
01/06/18
01/13/18
01/20/18
I converted the Week column into DateTime by writing:
df['Week']=pd.to_datetime(df['Week'])
Now the dtype is this:
All Markets object
Category object
Brand object
Target Age object
Segment object
Sub_brand object
Form object
Week datetime64[ns]
Sales float64
Units float64
TDP float64
No_of_Stores float64
dtype: object
I created a new dataframe where I summed the dataset by Week:
df_all_sales=df.groupby(["Week"]).sum()
df_all_sales
Week Sales Units TDP No_of_Stores
2018-01-06 3.524456e 07 2328906.175 1860.108 1068546.48
2018-01-13 3.108469e 07 2045011.831 1745.664 1068606.48
2018-01-20 2.603041e 07 1748838.880 1631.943 1067000.64
2018-01-27 2.453881e 07 1582999.340 1582.581 1067461.32
2018-02-03 2.440598e 07 1639932.560 1584.747 1067419.68
... ... ... ... ...
2020-06-27 6.205612e 06 373815.090 704.343 998781.74
2020-07-04 7.054332e 06 427955.540 779.252 999300.06
2020-07-11 7.137108e 06 438363.230 783.708 998931.23
2020-07-18 7.545068e 06 465413.700 822.505 998794.83
2020-07-25 7.329634e 06 458153.310 807.745 998794.83
134 rows × 4 columns
But when I'm trying to extract the Year, Month and date from the Week column, it is showing error. I tried these codes:
import datetime as dt
df_all_sales['year'] = pd.DatetimeIndex(df_all_sales['Week']).year
df_all_sales['month'] = pd.DatetimeIndex(df_all_sales['Week']).month
This is the error:
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Week'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-29-06aaa1163a18> in <module>
1 # extracting date, month and year from the datetime
----> 2 df_all_sales['year'] = pd.DatetimeIndex(df_all_sales['Week']).year
3 df_all_sales['month'] = pd.DatetimeIndex(df_all_sales['Week']).month
4
5 #df_all_sales['year'] = df_all_sales['Week'].dt.year
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
3022 if self.columns.nlevels > 1:
3023 return self._getitem_multilevel(key)
-> 3024 indexer = self.columns.get_loc(key)
3025 if is_integer(indexer):
3026 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 'Week'
Also I tried these codes:
df_all_sales['year'] = df_all_sales['Week'].dt.year
df_all_sales['month'] = df_all_sales['Week'].dt.month
This is the error:
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Week'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-30-06b216de9275> in <module>
3 #df_all_sales['month'] = pd.DatetimeIndex(df_all_sales['Week']).month
4
----> 5 df_all_sales['year'] = df_all_sales['Week'].dt.year
6 df_all_sales['month'] = df_all_sales['Week'].dt.month
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
3022 if self.columns.nlevels > 1:
3023 return self._getitem_multilevel(key)
-> 3024 indexer = self.columns.get_loc(key)
3025 if is_integer(indexer):
3026 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 'Week'
CodePudding user response:
After your groupby
, "Week" likely became the index for your DataFrame, so you would first need to reset_index
before trying to access a "Week" column:
df_all_sales = df_all_sales.reset_index()
df_all_sales["Year"] = df_all_sales["Week"].dt.year
df_all_sales["Month"] = df_all_sales["Week"].dt.month
df_all_sales["Week"] = df_all_sales["Week"].dt.week
Alternatively, if you want to access this from your DatetimeIndex, you can do:
df_all_sales["Year"] = df_all_sales.index.year
df_all_sales["Month"] = df_all_sales.index.month
df_all_sales["Week"] = df_all_sales.index.isocalendar().week
CodePudding user response:
Use dt
accessor:
df_all_sales['Week'] = pd.to_datetime(df['Week'])
df_all_sales['year'] = df_all_sales['Week'].dt.year
df_all_sales['month'] = df_all_sales['Week'].dt.month
Example:
# Before:
>>> df
Week
0 01/06/18
1 01/13/18
2 01/20/18
# After:
>>> df
Week year month
0 2018-01-06 2018 1
1 2018-01-13 2018 1
2 2018-01-20 2018 1
Try to do that before groupby
else Week
is not a column anymore but the index
else:
df_all_sales['Week'] = pd.to_datetime(df['Week'])
df_all_sales.groupby(["Week"]).sum()
df_all_sales['year'] = df_all_sales.index.year
df_all_sales['month'] = df_all_sales.index.month
Updated with the help of @HenryEcker