I would like to return a list of all unique dates - so reduced to days from the index. I am having trouble because the Date column is set as the index of the DataFrame and I want to do it without having to reset the index.
This is a sample of my DataFrame.
Input:
Open High Low Close Volume
Date
2022-03-31 09:30:00 177.3628 177.5123 176.5451 176.5850 10809
2022-03-31 09:35:00 176.5750 176.6448 176.1163 176.1662 2647
2022-03-31 09:40:00 176.1762 176.1861 175.3984 175.7274 2540
2022-03-31 09:45:00 175.7274 175.7274 175.1192 175.3286 3495
2022-03-31 09:50:00 175.3186 175.6576 175.2987 175.5280 1904
... ... ... ... ... ...
2022-09-21 15:35:00 155.9000 155.9400 154.8400 154.9300 3131
2022-09-21 15:40:00 154.9000 155.2300 154.6000 154.6000 3181
2022-09-21 15:45:00 154.5700 154.6000 154.1300 154.3700 3999
2022-09-21 15:50:00 154.3200 154.8100 154.2400 154.5100 4606
2022-09-21 15:55:00 154.5000 154.5000 153.6700 153.7100 10145
Desired output:
Dates = ["2022-03-31, 2022,04-01, 2022-04-02, ...]
CodePudding user response:
Considering that the dataframe looks like the following
df = pd.DataFrame({'Open': [177.3628, 176.5750, 176.1762, 175.7274, 175.3186, 155.9000, 154.9000, 154.5700, 154.3200, 154.5000],
'High': [177.5123, 176.6448, 176.1861, 175.7274, 175.6576, 155.9400, 155.2300, 154.6000, 154.8100, 154.5000],
'Low': [176.5451, 176.1163, 175.3984, 175.1192, 175.2987, 154.8400, 154.6000, 154.1300, 154.2400, 153.6700],
'Close': [176.5850, 176.1662, 175.7274, 175.3286, 175.5280, 154.9300, 154.6000, 154.3700, 154.5100, 153.7100],
'Volume': [10809, 2647, 2540, 3495, 1904, 3131, 3181, 3999, 4606, 10145]},
index=pd.to_datetime(['2022-03-31 09:30:00', '2022-03-31 09:35:00', '2022-03-31 09:40:00', '2022-03-31 09:45:00', '2022-03-31 09:50:00', '2022-09-21 15:35:00', '2022-09-21 15:40:00', '2022-09-21 15:45:00', '2022-09-21 15:50:00', '2022-09-21 15:55:00']))
[Out]:
Open High Low Close Volume
2022-03-31 09:30:00 177.3628 177.5123 176.5451 176.5850 10809
2022-03-31 09:35:00 176.5750 176.6448 176.1163 176.1662 2647
2022-03-31 09:40:00 176.1762 176.1861 175.3984 175.7274 2540
2022-03-31 09:45:00 175.7274 175.7274 175.1192 175.3286 3495
2022-03-31 09:50:00 175.3186 175.6576 175.2987 175.5280 1904
2022-09-21 15:35:00 155.9000 155.9400 154.8400 154.9300 3131
2022-09-21 15:40:00 154.9000 155.2300 154.6000 154.6000 3181
2022-09-21 15:45:00 154.5700 154.6000 154.1300 154.3700 3999
2022-09-21 15:50:00 154.3200 154.8100 154.2400 154.5100 4606
2022-09-21 15:55:00 154.5000 154.5000 153.6700 153.7100 10145
In this specific case, which I assume is the same as OP's, the dates are of type datetime64[ns]
(if note, see notes below) and are the index of the dataframe.
Therefore, one option would be using a list and a set
(unordered collection with no duplicate elements) as follows
Dates = list(set(df.index.date.tolist()))
[Out]: [datetime.date(2022, 9, 21), datetime.date(2022, 3, 31)]
Then, if one wants to obtain the output as a list of strings, one can do the following
Dates = [str(date) for date in Dates]
[Out]: ['2022-09-21', '2022-03-31']
A one-liner to obtain the desired output would be like this
Dates = [str(date) for date in list(set(df.index.date.tolist()))]
[Out]: ['2022-09-21', '2022-03-31']
Even though it was not specified, if one wants the dates sorted, then use sorted()
as follows
Dates = sorted([str(date) for date in list(set(df.index.date.tolist()))])
[Out]: ['2022-03-31', '2022-09-21']
Option 2
Even though OP doesn't want to use additional libraries, instead of list()
and set()
(and sorted()
), one can use numpy.unique
. This method returns a sorted unique elements of an array. It would be as follows
import numpy as np
Dates = [str(date) for date in np.unique(df.index.date).tolist()]
[Out]: ['2022-03-31', '2022-09-21']
Notes:
If the index is not of datetime, one can do the following
df.index = pd.to_datetime(df.index)
If one wants to specify different date formats, such as minutes, one would have to use
pandas.DatetimeIndex.strftime
with specific python string formats as followsDates = sorted([str(date) for date in list(set(df.index.strftime('%Y-%m-%d %H:%M').tolist()))]) [Out]: ['2022-03-31 09:30', '2022-03-31 09:35', '2022-03-31 09:40', '2022-03-31 09:45', '2022-03-31 09:50', '2022-09-21 15:35', '2022-09-21 15:40', '2022-09-21 15:45', '2022-09-21 15:50', '2022-09-21 15:55']
CodePudding user response:
There are quite a few ways to solve this, technically the easiest is to reset the index and do l = list(df.reset_index().Date.apply(lambda x:x.date()).unique())
You do not have to modify the dataframe object to do this.
You could also just do something like:
l = []
for x in df.index:
if x not in l: l.append(x)
Here is an example:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.zeros((10,10)))
list(df.reset_index()[0].apply(lambda x:x).unique())
CodePudding user response:
here is one way to do it Assumption: you're looking for unique dates, without time
dates = sorted(list(set(pd.to_datetime(df.index).strftime('%Y-%m-%d'))))
dates
['2022-09-21', '2022-03-31']
CodePudding user response:
You could use the unique() method on any dataframe column you desire, it returns a list of the unique values given a certain column:
Dates = df["Date"].unique()