How to filter data in Pandas by custom date?-CodePudding

There are some columns in Pandas dataframe:

name age mm yy
         01 23

How to filter rows where mmyy < current date using custom function (apply)?

mm and yy is as 8 non-null object

I tried this way:

from datetime import datetime

# mm = datetime.strptime(mm, "%m").strftime("%m")
# mm = datetime.strptime(mm, "%b").strftime("%m")  #str

mm = '06'
yy = '22'

cdd, cmm, cyy = datetime.today().strftime("%d %m %Y").split()
yy = datetime.strptime(yy, '%y').strftime("%Y")
mm = datetime.strptime(mm, "%m").strftime("%m")
 
dt = datetime(year=int(yy), month=int(mm), day=1)
present = datetime.now()
print(dt <present)

So, how to wrap this into custom function and apply in filter Pandas?

CodePudding user response：

Here's a way to do what your question asks, assuming the mm and yy columns contain strings:

import pandas as pd
df = pd.DataFrame({'name':['Alice', 'Bob'], 'age':[20, 30], 'mm':['01', '04'], 'yy':['23', '22']})
print(df)

from datetime import datetime
now = datetime.today().date()
df = df[df.apply(lambda x: datetime.strptime(f"{x.yy}/{x.mm}/01", "%y/%m/%d").date() < now, axis=1)]
print(df)

Input

    name  age  mm  yy
0  Alice   20  01  23
1    Bob   30  04  22

Output

  name  age  mm  yy
1  Bob   30  04  22

UPDATE: Based on OP's question in comments, this is how to do it if month is an abbreviated name like Jan for January (full docs on format codes here):

import pandas as pd
df = pd.DataFrame({'name':['Alice', 'Bob'], 'age':[20,30], 'mm':['Jan','Apr'], 'yy':['23','22']})
print(df)

from datetime import datetime
now = datetime.today().date()
df = df[df.apply(lambda x: datetime.strptime(f"{x.yy}/{x.mm}/01", "%y/%b/%d").date() < now, axis=1)]
print(df)

Input

    name  age   mm  yy
0  Alice   20  Jan  23
1    Bob   30  Apr  22

Output

  name  age   mm  yy
1  Bob   30  Apr  22

UPDATE #2: Here's a way to make this work for heterogeneous mm values (could be like '01', could be like 'Jan' for January):

import pandas as pd
df = pd.DataFrame({'name':['Alice', 'Bob', 'Carol', 'Dexter'], 'age':[20,30,40,50], 'mm':['Jan','Apr', '02', '03'], 'yy':['23','22','22','23']})
print(df)

from datetime import datetime
now = datetime.today().date()
df = df[df.apply(lambda x: datetime.strptime(f"{x.yy}/{x.mm}/01", "%y/%b/%d" if str.isalpha(x.mm) else "%y/%m/%d").date() < now, axis=1)]
print(df)

Input

     name  age   mm  yy
0   Alice   20  Jan  23
1     Bob   30  Apr  22
2   Carol   40   02  22
3  Dexter   50   03  23

Output

    name  age   mm  yy
1    Bob   30  Apr  22
2  Carol   40   02  22