I have a Pandas dataframe that looks like this:
df.head()
Date Abscount Year Quarter Month Week Number
0 2022-01-03 7.0 2022 1 1 1
1 2022-01-04 17.0 2022 1 1 1
2 2022-01-05 16.0 2022 1 1 1
3 2022-01-06 18.0 2022 1 1 1
4 2022-01-07 18.0 2022 1 1 1
There are a few rows with dates corresponding to the weekends, that I want to remove.
I am trying the following code to duplicate the dataframe with another column that shows the day of the week and then plan to drop those rows with a condition.
However, the following code does not work:
df = df[df['Date'].dt.day_name()]
Error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
c:\Users\013555\Desktop\Time Series Forecasting for Absence\Time series_absence_forecast.ipynb Cell 21' in <cell line: 1>()
----> 1 df = df[df['Date'].dt.day_name()]
File c:\Users\013555\Anaconda3\lib\site-packages\pandas\core\frame.py:3511, in DataFrame.__getitem__(self, key)
3509 if is_iterator(key):
3510 key = list(key)
-> 3511 indexer = self.columns._get_indexer_strict(key, "columns")[1]
3513 # take() does not accept boolean indexers
3514 if getattr(indexer, "dtype", None) == bool:
File c:\Users\013555\Anaconda3\lib\site-packages\pandas\core\indexes\base.py:5782, in Index._get_indexer_strict(self, key, axis_name)
5779 else:
5780 keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
-> 5782 self._raise_if_missing(keyarr, indexer, axis_name)
5784 keyarr = self.take(indexer)
5785 if isinstance(key, Index):
5786 # GH 42790 - Preserve name from an Index
File c:\Users\013555\Anaconda3\lib\site-packages\pandas\core\indexes\base.py:5842, in Index._raise_if_missing(self, key, indexer, axis_name)
5840 if use_interval_msg:
5841 key = list(key)
-> 5842 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
5844 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
5845 raise KeyError(f"{not_found} not in index")
KeyError: "None of [Index(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',\n 'Sunday', 'Monday', 'Tuesday', 'Wednesday',\n ...\n 'Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday',\n 'Friday', 'Saturday', 'Sunday', 'Monday'],\n dtype='object', length=141)] are in the [columns]"
Please help. Where am I going wrong?
CodePudding user response:
What you want to do is simply:
df['new_col_name'] = df['Date'].dt.day_name()
However, if you only need the column for a condition, you don't need to add it to the DataFrame. You can use it to filter it directly:
# Example: remove weekends.
df = df[~df['Date'].dt.day_name().isin(['Saturday', 'Sunday'])]
Instead, your line of code was trying to filter the DataFrame with the result of df['Date'].dt.day_name()
. Of course, the index of df
did not contain the values resulting from it.
Note that using DatetimeIndex.weekday
should be faster:
df = df[df['Date'].weekday.lt(5)]
CodePudding user response:
Found the answer. I just had to add a column using the following code:
df["Day"] = df["Date"].dt.day_name()