Home > Net >  How to extract Year, Week and Month from the Datetime column in my data?
How to extract Year, Week and Month from the Datetime column in my data?

Time:10-21

originally this was the dtype of my df:

All Markets      object
Category         object
Brand            object
Target Age       object
Segment          object
Sub_brand        object
Form             object
Week             object
Sales           float64
Units           float64
TDP             float64
No_of_Stores    float64
dtype: object

The "Week" column originally looked like this:

01/06/18
01/13/18
01/20/18

I converted the Week column into DateTime by writing:

df['Week']=pd.to_datetime(df['Week'])

Now the dtype is this:

All Markets             object
Category                object
Brand                   object
Target Age              object
Segment                 object
Sub_brand               object
Form                    object
Week            datetime64[ns]
Sales                  float64
Units                  float64
TDP                    float64
No_of_Stores           float64
dtype: object

I created a new dataframe where I summed the dataset by Week:

df_all_sales=df.groupby(["Week"]).sum()
df_all_sales


Week        Sales           Units       TDP         No_of_Stores        
2018-01-06  3.524456e 07    2328906.175 1860.108    1068546.48
2018-01-13  3.108469e 07    2045011.831 1745.664    1068606.48
2018-01-20  2.603041e 07    1748838.880 1631.943    1067000.64
2018-01-27  2.453881e 07    1582999.340 1582.581    1067461.32
2018-02-03  2.440598e 07    1639932.560 1584.747    1067419.68
... ... ... ... ...
2020-06-27  6.205612e 06    373815.090  704.343 998781.74
2020-07-04  7.054332e 06    427955.540  779.252 999300.06
2020-07-11  7.137108e 06    438363.230  783.708 998931.23
2020-07-18  7.545068e 06    465413.700  822.505 998794.83
2020-07-25  7.329634e 06    458153.310  807.745 998794.83
134 rows × 4 columns

But when I'm trying to extract the Year, Month and date from the Week column, it is showing error. I tried these codes:

import datetime as dt    
df_all_sales['year'] = pd.DatetimeIndex(df_all_sales['Week']).year
df_all_sales['month'] = pd.DatetimeIndex(df_all_sales['Week']).month

This is the error:

KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Week'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-29-06aaa1163a18> in <module>
      1 # extracting date, month and year from the datetime
----> 2 df_all_sales['year'] = pd.DatetimeIndex(df_all_sales['Week']).year
      3 df_all_sales['month'] = pd.DatetimeIndex(df_all_sales['Week']).month
      4 
      5 #df_all_sales['year'] = df_all_sales['Week'].dt.year

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3022             if self.columns.nlevels > 1:
   3023                 return self._getitem_multilevel(key)
-> 3024             indexer = self.columns.get_loc(key)
   3025             if is_integer(indexer):
   3026                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 'Week'

Also I tried these codes:

df_all_sales['year'] = df_all_sales['Week'].dt.year
df_all_sales['month'] = df_all_sales['Week'].dt.month

This is the error:

KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Week'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-30-06b216de9275> in <module>
      3 #df_all_sales['month'] = pd.DatetimeIndex(df_all_sales['Week']).month
      4 
----> 5 df_all_sales['year'] = df_all_sales['Week'].dt.year
      6 df_all_sales['month'] = df_all_sales['Week'].dt.month

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3022             if self.columns.nlevels > 1:
   3023                 return self._getitem_multilevel(key)
-> 3024             indexer = self.columns.get_loc(key)
   3025             if is_integer(indexer):
   3026                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 'Week'

CodePudding user response:

After your groupby, "Week" likely became the index for your DataFrame, so you would first need to reset_index before trying to access a "Week" column:

df_all_sales = df_all_sales.reset_index()
df_all_sales["Year"] = df_all_sales["Week"].dt.year
df_all_sales["Month"] = df_all_sales["Week"].dt.month
df_all_sales["Week"] = df_all_sales["Week"].dt.week

Alternatively, if you want to access this from your DatetimeIndex, you can do:

df_all_sales["Year"] = df_all_sales.index.year
df_all_sales["Month"] = df_all_sales.index.month
df_all_sales["Week"] = df_all_sales.index.isocalendar().week

CodePudding user response:

Use dt accessor:

df_all_sales['Week'] = pd.to_datetime(df['Week'])
df_all_sales['year'] = df_all_sales['Week'].dt.year
df_all_sales['month'] = df_all_sales['Week'].dt.month

Example:

# Before:
>>> df
       Week
0  01/06/18
1  01/13/18
2  01/20/18

# After:
>>> df
        Week  year  month
0 2018-01-06  2018      1
1 2018-01-13  2018      1
2 2018-01-20  2018      1

Try to do that before groupby else Week is not a column anymore but the index

else:

df_all_sales['Week'] = pd.to_datetime(df['Week'])
df_all_sales.groupby(["Week"]).sum()

df_all_sales['year'] = df_all_sales.index.year
df_all_sales['month'] = df_all_sales.index.month

Updated with the help of @HenryEcker

  • Related