Currently this is how my dataset looks
Num Name Type Status Current % Next Milestone\nDate
1 sam - Open 54% 2023-01-16 00:00:00
2 Dave - Open 54% 2023-01-20 00:00:00
3 Jake - Open 45% 2023-01-13 00:00:00
4 Oli - Open 30% 2023-01-31 00:00:00
I would like to filter this data to only show the next milestone dates due this week. Also i want the format to not include the time. Todays date 20/01/2023. i only want data from 16th-20th and currently think maybe i should filter for index upto 5 however, im really unsure.
Num Name Type Status Current % Next Milestone\nDate
1 sam - Open 54% 2023-01-16
2 Dave - Open 54% 2023-01-10
3 Jack - Open 45% 2023-01-17
4 Dean - Open 30% 2023-01-19
i tried using
for loop returns list 'days' with days of the week starting with current day i.e. Wednesday will give [2,3,4,0,1]
days = []
for i in range(7):
day = datetime.weekdays(datetime.today()) i
if day%7<5:
days.append(day%7)
## for loop iterates until the date reaches friday (stops increasing),
then the else appends the rest by going backwards
filters = [0]
for i in range(4):
if days[i 1]>days[i]:
filters.append(1 i)
else:
for j in range(4-i):
filters.append(-(j 1))
break
## 'filters' contains dates relative to 'today' i.e. on tuesday, filters
= [1,2,3,-1] <==> [We,Th,Fr,Mo]' i.e. on tuesday, weekdays = [1,2,3,-1]
<==> [We,Th,Fr,Mo]
## 'weekdays' will contain all the weekdays needed for the filters to be
used on the dataset
weekdays = []
## for loop appends the dates needed for the filter, by adding the
relative dates seen in weekdays using timedelta
for i in range(5):
other_days = datetime.today() timedelta(days=filters[i])
o_days = pd.to_datetime(other_days).date() ## convert to datetime
weekdays.append(o_days)
if statement adds the next week for thursday and friday
if datetime.weekday(datetime.today()) == 3 or
datetime.weekday(datetime.today()) == 4:
for i in range(5):
other_days = datetime.today() timedelta(days=filters[i] 7) ## same method as before this adds the days 7 days after each day in the current week
o_days = pd.to_datetime(other_days).date()
weekdays.append(o_days)
## converts weekdays to datetime64 data type so it can be filtered
weekdays = np.array(weekdays, dtype='datetime64')
## applies filter
dataset_modified = dataset_modified[(dataset_modified['Next Milestone\nDate'] > min(weekdays)) & (dataset_modified['Next Milestone\nDate']< max(weekdays))]
it gives error AttributeError: type object 'datetime.datetime' has no attribute 'weekdays'
CodePudding user response:
df['Next Milestone\nDate'] = pd.to_datetime(df['Next Milestone\nDate']).dt.date
df = df[(df['Next Milestone\nDate'] <= pd.to_datetime('20/01/2023'))&
(df['Next Milestone\nDate'] >= pd.to_datetime('16/01/2023')))
CodePudding user response:
Use Series.dt.normalize
for datetimes with 00:00:00
times and then filter in boolean indexing
by Series.between
:
df['Next Milestone\nDate'] = pd.to_datetime(df['Next Milestone\nDate']).dt.normalize()
out = df[df['Next Milestone\nDate'].between('2023-01-16','2023-01-20')]
If converting to dates filtering like imburningbabe
solution it is slow (here 5.5 times):
#200k rows
df = pd.concat([df] * 100000, ignore_index=True)
In [189]: %%timeit
...: df['Next Milestone Date'] = pd.to_datetime(df['Next Milestone Date']).dt.date
...: df1 = df[(df['Next Milestone Date'] <= pd.to_datetime('20/01/2023'))&(df['Next Milestone Date'] >= pd.to_datetime('16/01/2023'))]
...:
282 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [190]: %%timeit
...: df['Next Milestone Date'] = pd.to_datetime(df['Next Milestone Date']).dt.normalize()
...:
...: out = df[df['Next Milestone Date'].between('2023-01-16','2023-01-20')]
...:
51.2 ms ± 377 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)