Home > Software design >  subtracting 2 date columns in dataframe
subtracting 2 date columns in dataframe

Time:12-13

I have the following pandas dataframe

enter image description here

I want to subtract date_of_birth from date_of_death to make a new column, "years_lived" to contain the years lived. I tried all 3 ways below (individually of course)

df['years_lived'] = (df['date_of_death'] - df['date_of_birth']).dt.days
df['years_lived'] = df['date_of_death'].sub(df['date_of_birth'], axis=0)
df['years_lived'] = df['date_of_death'] - df['date_of_birth']

but I got a TypeError: unsupported operand type(s) for -: 'str' and 'str'

CodePudding user response:

df['years_lived'] = pd.to_datetime(df['date_of_death']) - pd.to_datetime(df['date_of_birth'])

CodePudding user response:

pd.to_datetime will convert your dates into datetime and you can subtract one from the other.

https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html

CodePudding user response:

You need to convert str represenation of dates to dates before subracting.

df['years_lived'] = (df['date_of_death'].astype(dt.timedelta) - df['date_of_birth'].astype(dt.timedelta)).dt.days

or

if deosn't work convert before and then subract

df['date_of_death'] = pd.to_numeric(df['date_of_death'], errors='coerce').fillna(0).astype(int)
df['date_of_birth'] =  pd.to_numeric(df['date_of_birth'], errors='coerce').fillna(0).astype(int)
df['years_lived'] = (df['date_of_death'] - df['date_of_birth']).dt.days
  • Related