For a Python dataframe (based on some criteria) I am able to select the index value (a date) (='first date') as well as the index value (a date) corresponding to the very last row ('last date').
I would like to calculate explicitly the difference (in days) between 'first date' and 'last date' (should be = 3 (number of days)). How can I do it for this case?
Many thanks in advance!
import pandas as pd
df1 = pd.DataFrame({"date": ['2021-3-22', '2021-3-23', '2021-3-24', '2021-3-25', '2021-3-26'],
"x": ['1', 1, 'nan', 'nan', 'nan' ]})
df1.set_index('date', inplace=True)
df1
date x
2021-3-22 1
2021-3-23 1
2021-3-24 nan
2021-3-25 nan
2021-3-26 nan
print('first date:', df1.x[df1.x == 1].tail(1).index.values)
first date: ['2021-3-23']
(=d1)
print('last date:', df1.tail(1).index.values)
last date: ['2021-3-26']
(=d2)
d2-d1=?
Many thanks in advance!
CodePudding user response:
You forgot to set the type of the 'date'
column. Currently, it is a string
but you want it to be a datetime
instance instead.
df1 = pd.DataFrame({"date": ['2021-3-22', '2021-3-23', '2021-3-24', '2021-3-25', '2021-3-26'],
"x": ['nan', 1, 'nan', 'nan', 'nan' ]})
df1['date'] = pd.to_datetime(df1['date'])
df1 = df1.set_index('date')
Now you can do:
df1.index[-1] - df1.index[0] #Timedelta('4 days 00:00:00')
or
(df1.index[-1] - df1.index[0]).days #4
CodePudding user response:
You need to convert the date's or index values in datetime format. You can either use datetime library or use pandas to do it. As your requirement you can do the following
(pd.to_datetime(df1.tail(1).index.values, format='%Y-%m-%d')
- pd.to_datetime(df1.x[df1.x == 1].tail(1).index.values, format='%Y-%m-%d')).days[0]
The pd.to_datetime will convert the string values in the required date format