How to create a "duration" column from two "dates" columns?-CodePudding

I have two columns ("basecamp_date" and "highpoint_date") in my "expeditions" dataframe, they have a start date (basecamp_date) and an end date ("highpoint_date") and I would like to create a new column that expresses the duration between these two dates but I have no idea how to do it.

import pandas as pd

expeditions = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/expeditions.csv")

CodePudding user response：

In read_csv convert columns to datetimes and then subtrat columns with Series.dt.days for days:

file = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/expeditions.csv"
expeditions = pd.read_csv(file, parse_dates=['basecamp_date','highpoint_date'])
    
expeditions['diff'] = expeditions['highpoint_date'].sub(expeditions['basecamp_date']).dt.days

CodePudding user response：

You can convert those columns to datetime and then subtract them to get the duration:

tstart = pd.to_datetime(expeditions['basecamp_date'])
tend = pd.to_datetime(expeditions['highpoint_date'])

expeditions['duration'])= pd.Timedelta(tend - tstart)