Home > Enterprise >  Iterating through a range of dates in Python with missing dates
Iterating through a range of dates in Python with missing dates

Time:04-19

Here I got a pandas data frame with daily return of stocks and columns are date and return rate. But if I only want to keep the last day of each week, and the data has some missing days, what can I do?

import pandas as pd

df = pd.read_csv('Daily_return.csv')
df.Date = pd.to_datetime(db.Date)
count = 300
for last_day in ('2017-01-01'   7n for n in range(count)):
  

Actually my brain stop working at this point with my limited imagination......Maybe one of the biggest point is " 7n" kind of stuff is meaningless with some missing dates.

CodePudding user response:

I'll create a sample dataset with 40 dates and 40 sample returns, then sample 90 percent of that randomly to simulate the missing dates.

The key here is that you need to convert your date column into datetime if it isn't already, and make sure your df is sorted by the date.

Then you can groupby year/week and take the last value. If you run this repeatedly you'll see that the selected dates can change if the value dropped was the last day of the week.

Based on that

import pandas as pd
import numpy as np

df = pd.DataFrame()
df['date'] = pd.date_range(start='04-18-2022',periods=40, freq='D')
df['return'] = np.random.uniform(size=40)

# Keep 90 percent of the records so we can see what happens when some days are missing
df = df.sample(frac=.9)

# In case your dates are actually strings
df['date'] = pd.to_datetime(df['date'])

# Make sure they are sorted from oldest to newest
df = df.sort_values(by='date')

df = df.groupby([df['date'].dt.isocalendar().year,
                 df['date'].dt.isocalendar().week], as_index=False).last()

print(df)

Output

       date    return
0 2022-04-24  0.299958
1 2022-05-01  0.248471
2 2022-05-08  0.506919
3 2022-05-15  0.541929
4 2022-05-22  0.588768
5 2022-05-27  0.504419
  • Related