Home > Blockchain >  Filter all rows in a DataFrame with a selected Month (Pandas Datetime)
Filter all rows in a DataFrame with a selected Month (Pandas Datetime)

Time:12-18

I have the following set of dates and I'm looking to filter all the rows if it has '2022-02' in it.

            Sunday      Monday      Tuesday     Wednesday   Thursday    Friday      Saturday
January     2021-12-26  2021-12-27  2021-12-28  2021-12-29  2021-12-30  2021-12-31  2022-01-01
January     2022-01-02  2022-01-03  2022-01-04  2022-01-05  2022-01-06  2022-01-07  2022-01-08
January     2022-01-09  2022-01-10  2022-01-11  2022-01-12  2022-01-13  2022-01-14  2022-01-15
January     2022-01-16  2022-01-17  2022-01-18  2022-01-19  2022-01-20  2022-01-21  2022-01-22
January     2022-01-23  2022-01-24  2022-01-25  2022-01-26  2022-01-27  2022-01-28  2022-01-29
February    2022-01-30  2022-01-31  2022-02-01  2022-02-02  2022-02-03  2022-02-04  2022-02-05
February    2022-02-06  2022-02-07  2022-02-08  2022-02-09  2022-02-10  2022-02-11  2022-02-12
February    2022-02-13  2022-02-14  2022-02-15  2022-02-16  2022-02-17  2022-02-18  2022-02-19
February    2022-02-20  2022-02-21  2022-02-22  2022-02-23  2022-02-24  2022-02-25  2022-02-26
March       2022-02-27  2022-02-28  2022-03-01  2022-03-02  2022-03-03  2022-03-04  2022-03-05
March       2022-03-06  2022-03-07  2022-03-08  2022-03-09  2022-03-10  2022-03-11  2022-03-12
March       2022-03-13  2022-03-14  2022-03-15  2022-03-16  2022-03-17  2022-03-18  2022-03-19
March       2022-03-20  2022-03-21  2022-03-22  2022-03-23  2022-03-24  2022-03-25  2022-03-26
April       2022-03-27  2022-03-28  2022-03-29  2022-03-30  2022-03-31  2022-04-01  2022-04-02
April       2022-04-03  2022-04-04  2022-04-05  2022-04-06  2022-04-07  2022-04-08  2022-04-09
....

I am trying to filter out the rows with '2022-02'. As shown below: PS this DataFrame has a row of March because '2022-02' is also present in that row.

            Sunday      Monday      Tuesday     Wednesday   Thursday    Friday      Saturday
February    2022-01-30  2022-01-31  2022-02-01  2022-02-02  2022-02-03  2022-02-04  2022-02-05
February    2022-02-06  2022-02-07  2022-02-08  2022-02-09  2022-02-10  2022-02-11  2022-02-12
February    2022-02-13  2022-02-14  2022-02-15  2022-02-16  2022-02-17  2022-02-18  2022-02-19
February    2022-02-20  2022-02-21  2022-02-22  2022-02-23  2022-02-24  2022-02-25  2022-02-26
March       2022-02-27  2022-02-28  2022-03-01  2022-03-02  2022-03-03  2022-03-04  2022-03-05

I've tried these methods, without any success.

calendar.loc[(calendar >= '2022-02-01') & (calendar < '2020-03-01')]
calendar[calendar == '2022-02-01']

CodePudding user response:

Based on your comments -

yes they are strings

IIUC, these dates are strings, you are trying to treat them as a date variable while searching for strings). The ideal way to solve this would be to convert everything to datetime and then use pd.series.dt. methods but, since all your data is already strings and this is just a simple string search problem, feel free to just use the following.

  1. row.str.find('2022-02')>=0) this checks if, in a given row, it finds the given string.
  2. calender.apply(lambda row ... applies it to each row independently.
  3. .any(1) takes an OR between the True/False values for each row, to give a True, if any of the rows contain 2022-02
  4. calender[...] using the above boolean series to get only those rows that you want.

Hope this explains the code below -

calender[calender.apply(lambda row: row.str.find('2022-02')>=0).any(1)]
              Sunday      Monday     Tuesday   Wednesday    Thursday  \
February  2022-01-30  2022-01-31  2022-02-01  2022-02-02  2022-02-03   
February  2022-02-06  2022-02-07  2022-02-08  2022-02-09  2022-02-10   
February  2022-02-13  2022-02-14  2022-02-15  2022-02-16  2022-02-17   
February  2022-02-20  2022-02-21  2022-02-22  2022-02-23  2022-02-24   
March     2022-02-27  2022-02-28  2022-03-01  2022-03-02  2022-03-03   

              Friday    Saturday  
February  2022-02-04  2022-02-05  
February  2022-02-11  2022-02-12  
February  2022-02-18  2022-02-19  
February  2022-02-25  2022-02-26  
March     2022-03-04  2022-03-05  

Avoid comparing date strings as 2022-01-10 > 2022-02-11

Just to add here, for posterity, comparing dates as strings using > or <= is not a good practice, and its discussed in detail on this popular SO post here.

Furthermore, doing string comparisons only compares Unicode values for each character, and is prone to failed cases such as this '10/15/13' > '10/12/14'

So I would highly recommend avoiding these types of coding practices.

CodePudding user response:

Use from this code


df[(df.Saturday <= '2022-03-06')&(df.Sunday >= '2022-01-26')]

it is very simple and fast If your dataset is large before this line you can use from convert_dtypes method(some times pandas use from object for data type and it is very slow)

  • Related