Home > other >  Removing rows with weekends from dataframe in Pandas
Removing rows with weekends from dataframe in Pandas

Time:05-24

I have a Pandas dataframe that looks like this:

df.head()

        Date    Abscount    Year    Quarter Month   Week Number
0   2022-01-03  7.0     2022    1   1   1
1   2022-01-04  17.0    2022    1   1   1
2   2022-01-05  16.0    2022    1   1   1
3   2022-01-06  18.0    2022    1   1   1
4   2022-01-07  18.0    2022    1   1   1

There are a few rows with dates corresponding to the weekends, that I want to remove.

I am trying the following code to duplicate the dataframe with another column that shows the day of the week and then plan to drop those rows with a condition.

However, the following code does not work:

df = df[df['Date'].dt.day_name()]

Error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
c:\Users\013555\Desktop\Time Series Forecasting for Absence\Time series_absence_forecast.ipynb Cell 21' in <cell line: 1>()
----> 1 df = df[df['Date'].dt.day_name()]

File c:\Users\013555\Anaconda3\lib\site-packages\pandas\core\frame.py:3511, in DataFrame.__getitem__(self, key)
   3509     if is_iterator(key):
   3510         key = list(key)
-> 3511     indexer = self.columns._get_indexer_strict(key, "columns")[1]
   3513 # take() does not accept boolean indexers
   3514 if getattr(indexer, "dtype", None) == bool:

File c:\Users\013555\Anaconda3\lib\site-packages\pandas\core\indexes\base.py:5782, in Index._get_indexer_strict(self, key, axis_name)
   5779 else:
   5780     keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
-> 5782 self._raise_if_missing(keyarr, indexer, axis_name)
   5784 keyarr = self.take(indexer)
   5785 if isinstance(key, Index):
   5786     # GH 42790 - Preserve name from an Index

File c:\Users\013555\Anaconda3\lib\site-packages\pandas\core\indexes\base.py:5842, in Index._raise_if_missing(self, key, indexer, axis_name)
   5840     if use_interval_msg:
   5841         key = list(key)
-> 5842     raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   5844 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
   5845 raise KeyError(f"{not_found} not in index")

KeyError: "None of [Index(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',\n       'Sunday', 'Monday', 'Tuesday', 'Wednesday',\n       ...\n       'Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday',\n       'Friday', 'Saturday', 'Sunday', 'Monday'],\n      dtype='object', length=141)] are in the [columns]"

Please help. Where am I going wrong?

CodePudding user response:

What you want to do is simply:

df['new_col_name'] = df['Date'].dt.day_name()

However, if you only need the column for a condition, you don't need to add it to the DataFrame. You can use it to filter it directly:

# Example: remove weekends.
df = df[~df['Date'].dt.day_name().isin(['Saturday', 'Sunday'])]

Instead, your line of code was trying to filter the DataFrame with the result of df['Date'].dt.day_name(). Of course, the index of df did not contain the values resulting from it.

Note that using DatetimeIndex.weekday should be faster:

df = df[df['Date'].weekday.lt(5)]

CodePudding user response:

Found the answer. I just had to add a column using the following code:

df["Day"] = df["Date"].dt.day_name()
  • Related