I want to calculate the age from date of birth in my pandas dataframe. However, some of the date in the column are NaN
which cause me some error due to the format is different. This is my code:
dob = {'DOB': ['11/29/1986', 'NaN', '02/23/2006']}
# Creating dataframe
df33 = pd.DataFrame(data = dob)
# This function converts given date to age
def age(born):
born = datetime.strptime(born, "%m/%d/%Y").date()
today = date.today()
return today.year - born.year - ((today.month,
today.day) < (born.month,
born.day))
df33['Age'] = df33['DOB'].apply(age)
display(df33)
May I know how should I modified the code so that it can ignore the NaN
values and continue to compute the age for the other rows? Those rows with NaN
can just leave it as NaN
. Any help or advise will be greatly appreciated!
CodePudding user response:
You can modify your age function to leave rows unchanged by adding exception handling to the age
function.
import pandas as pd
from datetime import datetime, date
# added pd.NaT to posted data
dob = {'DOB': ['11/29/1986', 'NaN', pd.NaT, '02/23/2006']}
# Creating dataframe
df33 = pd.DataFrame(data = dob)
def age(born):
try:
born = datetime.strptime(born, "%m/%d/%Y").date()
today = date.today()
return today.year - born.year - ((today.month,
today.day) < (born.month,
born.day))
except (ValueError, TypeError):
return born # leave unchanged
dob = {'DOB': ['11/29/1986', 'NaN', '02/23/2006']}
df33['Age'] = df33['DOB'].apply(age)
display(df33)
Output
DOB Age
0 11/29/1986 35
1 NaN NaN
2 NaT NaT
3 02/23/2006 16
CodePudding user response:
Note: This below answer is only for you to fix your issue. I recommend using a library like relativedelta
to correctly compute the age.
The 'NaN' is not really numpy.nan, you should modify the dob
dictionary as:
dob = {'DOB': ['11/29/1986', pd.NaT, '02/23/2006']}
for datetime types its better to use pandas NaT value to indicate not a time value.
Then you could use it to convert to pd.datetime and then do other stuff.
But a quick fix without modifying your dictionary dob
is :
include this check at the beginning:
if born == 'NaN':
return 'NaN'
CodePudding user response:
This can be done using relativedelta
without using a separate function.
Install the modules
pip install python-dateutil
CODE
import pandas as pd
from datetime import datetime
from dateutil.relativedelta import relativedelta
import numpy as np
dob = {'DOB': ['11/29/1986', np.nan, '02/23/2006']}
# Creating dataframe
df33 = pd.DataFrame(data=dob)
df33["DOB"] = pd.to_datetime(df33["DOB"])
df33["Age"] = df33.apply(lambda x: relativedelta(datetime.now().date(), x['DOB']).years if x.notnull().all() else pd.NaT, axis=1)
print(df33)
OUTPUT
DOB Age
0 1986-11-29 35.0
1 NaT NaT
2 2006-02-23 16.0