Home > Mobile >  Pandas: How to convert integer born date to age
Pandas: How to convert integer born date to age

Time:09-17

I have a df, you can have it by running this code:

import pandas as pd
from io import StringIO

dfs = """
BornDate
2   19850100
3   19000100
5   19850100
6   19000100
7   19820100
8   19850100
9   19000100
10  19790100
11  19850100

"""

df = pd.read_csv(StringIO(dfs.strip()), sep='\s ', 
                  dtype={"BornDate": int})
df

The output is:

BornDate
2   19850100
3   19000100
5   19850100
6   19000100
7   19820100
8   19850100
9   19000100
10  19790100
11  19850100

What I need is converting the BornDate to age base on a variable ValuationDate, the age equals ValuationDate minus BornDate:

ValuationDate = 20201231
ValuationDate=pd.to_datetime(ValuationDate)
df['BornDate']=pd.to_datetime(df['BornDate'])

df['BornDate']=ValuationDate-df['BornDate']

df['BornDate']

But it returns:

2    00:00:00.000351
3    00:00:00.001201
5    00:00:00.000351
6    00:00:00.001201
7    00:00:00.000381
8    00:00:00.000351
9    00:00:00.001201
10   00:00:00.000411
11   00:00:00.000351
Name: BornDate, dtype: timedelta64[ns]

The output should be:

2    35.1
3    12.01
5    35.1
6    120.1
7    38.1
8    35.1
9    120.1
10   41.1
11   35.1

CodePudding user response:

Suppose your column BornDate is in format the same as ValuationDate in YYYYmmdd, then, you should change your codes as follows to cater for this date string format:

ValuationDate = 20201231
ValuationDate=pd.to_datetime(ValuationDate, format='%Y%m%d')
df['BornDate']=pd.to_datetime(df['BornDate'], format='%Y%m%d')

df['BornDate'] = (ValuationDate - df['BornDate']) / np.timedelta64(1, 'Y')

Data Input

Modified the day of each date from invalid 00 to 01:

    BornDate
2   19850101
3   19000101
5   19850101
6   19000101
7   19820101
8   19850101
9   19000101
10  19790101
11  19850101

Output

      BornDate
2    35.998001
3   120.999062
5    35.998001
6   120.999062
7    38.998747
8    35.998001
9   120.999062
10   41.999493
11   35.998001

CodePudding user response:

Use dt.microseconds:

>>> df['BornDate'].dt.microseconds.div(10)
2      35.1
3     120.1
5      35.1
6     120.1
7      38.1
8      35.1
9     120.1
10     41.1
11     35.1
Name: BornDate, dtype: float64
  • Related