Home > OS >  How to calculate the age from date of birth when there are NaN in between some of the rows in pandas
How to calculate the age from date of birth when there are NaN in between some of the rows in pandas

Time:07-18

I want to calculate the age from date of birth in my pandas dataframe. However, some of the date in the column are NaN which cause me some error due to the format is different. This is my code:

dob = {'DOB': ['11/29/1986', 'NaN', '02/23/2006']}
  
# Creating dataframe
df33 = pd.DataFrame(data = dob)

# This function converts given date to age
def age(born):
    born = datetime.strptime(born, "%m/%d/%Y").date()
    today = date.today()
    return today.year - born.year - ((today.month, 
                                      today.day) < (born.month, 
                                                    born.day))
  
df33['Age'] = df33['DOB'].apply(age)

display(df33)

May I know how should I modified the code so that it can ignore the NaN values and continue to compute the age for the other rows? Those rows with NaN can just leave it as NaN. Any help or advise will be greatly appreciated!

CodePudding user response:

You can modify your age function to leave rows unchanged by adding exception handling to the age function.

import pandas as pd
from datetime import datetime, date

# added pd.NaT to posted data
dob = {'DOB': ['11/29/1986', 'NaN', pd.NaT, '02/23/2006']}

# Creating dataframe
df33 = pd.DataFrame(data = dob)

def age(born):
    try:
        born = datetime.strptime(born, "%m/%d/%Y").date()
        today = date.today()
        return today.year - born.year - ((today.month, 
                                          today.day) < (born.month, 
                                                        born.day))
    except (ValueError, TypeError):
        return born    # leave unchanged

dob = {'DOB': ['11/29/1986', 'NaN', '02/23/2006']}
df33['Age'] = df33['DOB'].apply(age)

display(df33)

Output

    DOB Age
0   11/29/1986  35
1   NaN        NaN
2   NaT        NaT
3   02/23/2006  16

CodePudding user response:

Note: This below answer is only for you to fix your issue. I recommend using a library like relativedelta to correctly compute the age.

The 'NaN' is not really numpy.nan, you should modify the dob dictionary as:

dob = {'DOB': ['11/29/1986', pd.NaT, '02/23/2006']}

for datetime types its better to use pandas NaT value to indicate not a time value.

Then you could use it to convert to pd.datetime and then do other stuff.


But a quick fix without modifying your dictionary dob is :

include this check at the beginning:

if born == 'NaN':
    return 'NaN'

CodePudding user response:

This can be done using relativedelta without using a separate function.

Install the modules

pip install python-dateutil

CODE

import pandas as pd
from datetime import datetime
from dateutil.relativedelta import relativedelta
import numpy as np

dob = {'DOB': ['11/29/1986', np.nan, '02/23/2006']}

# Creating dataframe
df33 = pd.DataFrame(data=dob)

df33["DOB"] = pd.to_datetime(df33["DOB"])
df33["Age"] = df33.apply(lambda x: relativedelta(datetime.now().date(), x['DOB']).years if x.notnull().all() else pd.NaT, axis=1)
print(df33)

OUTPUT

         DOB   Age
0 1986-11-29  35.0
1        NaT   NaT
2 2006-02-23  16.0
  • Related