I have a data frame as follows.
pd.DataFrame({"Date":["09-01-2021","09-02-2021","09-02-2021","09-04-2021"]})
How do I check all the dates between max and min dates in the date column in a data frame?
Here "09-03-2021" is missing. So expected output is ["09-03-2021"], i.e. list of all missing dates.
CodePudding user response:
You can use the date_range
function and check if the unique values are the same in your dataframe and the generated range -
df = pd.DataFrame({"Date":["09-01-2021","09-02-2021","09-02-2021","09-04-2021"]})
rng = pd.date_range(df.Date.min(), df.Date.max())
print(set(rng) == set(df['Date']))
#returns False
CodePudding user response:
Question 1
How do I check all the dates between max and min dates in the date column in a data frame?
import pandas as pd
df = pd.DataFrame({"Date":["09-01-2021","09-02-2021","09-02-2021","09-04-2021"]})
df['Date'] = pd.to_datetime(df['Date'])
print(df.loc[(df['Date'] > min(df['Date'])) & (df['Date'] < max(df['Date']))])
Date | |
---|---|
1 | 09-02-2021 |
2 | 09-02-2021 |
EDIT
Question 2
How to list of all missing dates?
Reference (Available in the comment) check for any missing dates in the index
print(pd.date_range(start = min(df['Date']), end = max(df['Date'])).difference(df['Date']))
>>> DatetimeIndex(['2021-09-03'], dtype='datetime64[ns]', freq=None)
If the dataframe in the sorted order you can try using
print(pd.date_range(start = df.iloc[0]['Date'], end = df.iloc[-1]['Date']).difference(df['Date']))