I have the following set of dates and I'm looking to filter all the rows if it has '2022-02' in it.
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
January 2021-12-26 2021-12-27 2021-12-28 2021-12-29 2021-12-30 2021-12-31 2022-01-01
January 2022-01-02 2022-01-03 2022-01-04 2022-01-05 2022-01-06 2022-01-07 2022-01-08
January 2022-01-09 2022-01-10 2022-01-11 2022-01-12 2022-01-13 2022-01-14 2022-01-15
January 2022-01-16 2022-01-17 2022-01-18 2022-01-19 2022-01-20 2022-01-21 2022-01-22
January 2022-01-23 2022-01-24 2022-01-25 2022-01-26 2022-01-27 2022-01-28 2022-01-29
February 2022-01-30 2022-01-31 2022-02-01 2022-02-02 2022-02-03 2022-02-04 2022-02-05
February 2022-02-06 2022-02-07 2022-02-08 2022-02-09 2022-02-10 2022-02-11 2022-02-12
February 2022-02-13 2022-02-14 2022-02-15 2022-02-16 2022-02-17 2022-02-18 2022-02-19
February 2022-02-20 2022-02-21 2022-02-22 2022-02-23 2022-02-24 2022-02-25 2022-02-26
March 2022-02-27 2022-02-28 2022-03-01 2022-03-02 2022-03-03 2022-03-04 2022-03-05
March 2022-03-06 2022-03-07 2022-03-08 2022-03-09 2022-03-10 2022-03-11 2022-03-12
March 2022-03-13 2022-03-14 2022-03-15 2022-03-16 2022-03-17 2022-03-18 2022-03-19
March 2022-03-20 2022-03-21 2022-03-22 2022-03-23 2022-03-24 2022-03-25 2022-03-26
April 2022-03-27 2022-03-28 2022-03-29 2022-03-30 2022-03-31 2022-04-01 2022-04-02
April 2022-04-03 2022-04-04 2022-04-05 2022-04-06 2022-04-07 2022-04-08 2022-04-09
....
I am trying to filter out the rows with '2022-02'. As shown below: PS this DataFrame has a row of March because '2022-02' is also present in that row.
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
February 2022-01-30 2022-01-31 2022-02-01 2022-02-02 2022-02-03 2022-02-04 2022-02-05
February 2022-02-06 2022-02-07 2022-02-08 2022-02-09 2022-02-10 2022-02-11 2022-02-12
February 2022-02-13 2022-02-14 2022-02-15 2022-02-16 2022-02-17 2022-02-18 2022-02-19
February 2022-02-20 2022-02-21 2022-02-22 2022-02-23 2022-02-24 2022-02-25 2022-02-26
March 2022-02-27 2022-02-28 2022-03-01 2022-03-02 2022-03-03 2022-03-04 2022-03-05
I've tried these methods, without any success.
calendar.loc[(calendar >= '2022-02-01') & (calendar < '2020-03-01')]
calendar[calendar == '2022-02-01']
CodePudding user response:
Based on your comments -
yes they are strings
IIUC, these dates are strings, you are trying to treat them as a date variable while searching for strings). The ideal way to solve this would be to convert everything to datetime and then use pd.series.dt.
methods but, since all your data is already strings and this is just a simple string search problem, feel free to just use the following.
row.str.find('2022-02')>=0)
this checks if, in a given row, it finds the given string.calender.apply(lambda row ...
applies it to each row independently..any(1)
takes anOR
between the True/False values for each row, to give a True, if any of the rows contain2022-02
calender[...]
using the above boolean series to get only those rows that you want.
Hope this explains the code below -
calender[calender.apply(lambda row: row.str.find('2022-02')>=0).any(1)]
Sunday Monday Tuesday Wednesday Thursday \
February 2022-01-30 2022-01-31 2022-02-01 2022-02-02 2022-02-03
February 2022-02-06 2022-02-07 2022-02-08 2022-02-09 2022-02-10
February 2022-02-13 2022-02-14 2022-02-15 2022-02-16 2022-02-17
February 2022-02-20 2022-02-21 2022-02-22 2022-02-23 2022-02-24
March 2022-02-27 2022-02-28 2022-03-01 2022-03-02 2022-03-03
Friday Saturday
February 2022-02-04 2022-02-05
February 2022-02-11 2022-02-12
February 2022-02-18 2022-02-19
February 2022-02-25 2022-02-26
March 2022-03-04 2022-03-05
Avoid comparing date strings as 2022-01-10 > 2022-02-11
Just to add here, for posterity, comparing dates as strings using >
or <=
is not a good practice, and its discussed in detail on this popular SO post here.
Furthermore, doing string comparisons only compares Unicode values for each character, and is prone to failed cases such as this '10/15/13' > '10/12/14'
So I would highly recommend avoiding these types of coding practices.
CodePudding user response:
Use from this code
df[(df.Saturday <= '2022-03-06')&(df.Sunday >= '2022-01-26')]
it is very simple and fast If your dataset is large before this line you can use from convert_dtypes method(some times pandas use from object for data type and it is very slow)