There are some columns in Pandas dataframe:
name age mm yy
01 23
How to filter rows where mmyy < current date using custom function (apply)?
mm and yy is as 8 non-null object
I tried this way:
from datetime import datetime
# mm = datetime.strptime(mm, "%m").strftime("%m")
# mm = datetime.strptime(mm, "%b").strftime("%m") #str
mm = '06'
yy = '22'
cdd, cmm, cyy = datetime.today().strftime("%d %m %Y").split()
yy = datetime.strptime(yy, '%y').strftime("%Y")
mm = datetime.strptime(mm, "%m").strftime("%m")
dt = datetime(year=int(yy), month=int(mm), day=1)
present = datetime.now()
print(dt <present)
So, how to wrap this into custom function and apply in filter Pandas?
CodePudding user response:
Here's a way to do what your question asks, assuming the mm
and yy
columns contain strings:
import pandas as pd
df = pd.DataFrame({'name':['Alice', 'Bob'], 'age':[20, 30], 'mm':['01', '04'], 'yy':['23', '22']})
print(df)
from datetime import datetime
now = datetime.today().date()
df = df[df.apply(lambda x: datetime.strptime(f"{x.yy}/{x.mm}/01", "%y/%m/%d").date() < now, axis=1)]
print(df)
Input
name age mm yy
0 Alice 20 01 23
1 Bob 30 04 22
Output
name age mm yy
1 Bob 30 04 22
UPDATE: Based on OP's question in comments, this is how to do it if month is an abbreviated name like Jan
for January (full docs on format codes here):
import pandas as pd
df = pd.DataFrame({'name':['Alice', 'Bob'], 'age':[20,30], 'mm':['Jan','Apr'], 'yy':['23','22']})
print(df)
from datetime import datetime
now = datetime.today().date()
df = df[df.apply(lambda x: datetime.strptime(f"{x.yy}/{x.mm}/01", "%y/%b/%d").date() < now, axis=1)]
print(df)
Input
name age mm yy
0 Alice 20 Jan 23
1 Bob 30 Apr 22
Output
name age mm yy
1 Bob 30 Apr 22
UPDATE #2: Here's a way to make this work for heterogeneous mm
values (could be like '01', could be like 'Jan' for January):
import pandas as pd
df = pd.DataFrame({'name':['Alice', 'Bob', 'Carol', 'Dexter'], 'age':[20,30,40,50], 'mm':['Jan','Apr', '02', '03'], 'yy':['23','22','22','23']})
print(df)
from datetime import datetime
now = datetime.today().date()
df = df[df.apply(lambda x: datetime.strptime(f"{x.yy}/{x.mm}/01", "%y/%b/%d" if str.isalpha(x.mm) else "%y/%m/%d").date() < now, axis=1)]
print(df)
Input
name age mm yy
0 Alice 20 Jan 23
1 Bob 30 Apr 22
2 Carol 40 02 22
3 Dexter 50 03 23
Output
name age mm yy
1 Bob 30 Apr 22
2 Carol 40 02 22