I need a function to count the total number of days in the 'days' column between a start date of 1st Jan 1995 and an end date of 31st Dec 2019 in a dataframe taking Leapyears into account as well
For example:
- 1st Jan 1995 - Day 1
- 1st Feb 1995 - Day 32
- 2nd Feb 1995 - Day 33...
And so on all the way to 31st Dec 2019.
This is the function I created initially but it doesn't work.
prices
is the name of the data frame and 'days'
is the column where the number of days is to reflect.
def date_difference(self):
for i in range(prices.shape[0] - 1):
prices['days'][i 1] = (prices['days'][i 1] - prices['days'][i])
CodePudding user response:
Convert types
First of all, make sure that the days
column is the proper type. Use df.days.dtype
and it should be datetime64
. If you get object
type that means you have a string containing a date and you need to convert the type using
df.days = pd.to_datetime(df.days)
Calculate difference
df['days_diff'] = (df.days - pd.Timestamp('1995-01-01')).dt.days
Also, I would recommend changing the name of the column to date before it contains dates. Later you can assign the days to a column called so. It's just for clarity of your code and future maintaining it.
CodePudding user response:
I finally got it to work by doing this:
def date_difference(last_day):
last_day = pd.to_datetime(last_day, dayfirst = True)
first_day = pd.to_datetime("01/01/1995", dayfirst = True)
diff = last_day - first_day
prices['days'] = prices['days'].apply(date_difference)