How to obtain unique dates as a list from DataFrame with Datetime index-CodePudding

I would like to return a list of all unique dates - so reduced to days from the index. I am having trouble because the Date column is set as the index of the DataFrame and I want to do it without having to reset the index.

This is a sample of my DataFrame.

Input:

                         Open          High     Low         Close       Volume
    Date                    
    2022-03-31 09:30:00 177.3628    177.5123    176.5451    176.5850    10809
    2022-03-31 09:35:00 176.5750    176.6448    176.1163    176.1662    2647
    2022-03-31 09:40:00 176.1762    176.1861    175.3984    175.7274    2540
    2022-03-31 09:45:00 175.7274    175.7274    175.1192    175.3286    3495
    2022-03-31 09:50:00 175.3186    175.6576    175.2987    175.5280    1904
    ... ... ... ... ... ...
    2022-09-21 15:35:00 155.9000    155.9400    154.8400    154.9300    3131
    2022-09-21 15:40:00 154.9000    155.2300    154.6000    154.6000    3181
    2022-09-21 15:45:00 154.5700    154.6000    154.1300    154.3700    3999
    2022-09-21 15:50:00 154.3200    154.8100    154.2400    154.5100    4606
    2022-09-21 15:55:00 154.5000    154.5000    153.6700    153.7100    10145

Desired output:

    Dates = ["2022-03-31, 2022,04-01, 2022-04-02, ...]

CodePudding user response：

Considering that the dataframe looks like the following

df = pd.DataFrame({'Open': [177.3628, 176.5750, 176.1762, 175.7274, 175.3186, 155.9000, 154.9000, 154.5700, 154.3200, 154.5000],
                     'High': [177.5123, 176.6448, 176.1861, 175.7274, 175.6576, 155.9400, 155.2300, 154.6000, 154.8100, 154.5000],
                        'Low': [176.5451, 176.1163, 175.3984, 175.1192, 175.2987, 154.8400, 154.6000, 154.1300, 154.2400, 153.6700],
                        'Close': [176.5850, 176.1662, 175.7274, 175.3286, 175.5280, 154.9300, 154.6000, 154.3700, 154.5100, 153.7100],
                        'Volume': [10809, 2647, 2540, 3495, 1904, 3131, 3181, 3999, 4606, 10145]},
                        index=pd.to_datetime(['2022-03-31 09:30:00', '2022-03-31 09:35:00', '2022-03-31 09:40:00', '2022-03-31 09:45:00', '2022-03-31 09:50:00', '2022-09-21 15:35:00', '2022-09-21 15:40:00', '2022-09-21 15:45:00', '2022-09-21 15:50:00', '2022-09-21 15:55:00']))

[Out]:

                         Open      High       Low     Close  Volume
2022-03-31 09:30:00  177.3628  177.5123  176.5451  176.5850   10809
2022-03-31 09:35:00  176.5750  176.6448  176.1163  176.1662    2647
2022-03-31 09:40:00  176.1762  176.1861  175.3984  175.7274    2540
2022-03-31 09:45:00  175.7274  175.7274  175.1192  175.3286    3495
2022-03-31 09:50:00  175.3186  175.6576  175.2987  175.5280    1904
2022-09-21 15:35:00  155.9000  155.9400  154.8400  154.9300    3131
2022-09-21 15:40:00  154.9000  155.2300  154.6000  154.6000    3181
2022-09-21 15:45:00  154.5700  154.6000  154.1300  154.3700    3999
2022-09-21 15:50:00  154.3200  154.8100  154.2400  154.5100    4606
2022-09-21 15:55:00  154.5000  154.5000  153.6700  153.7100   10145

In this specific case, which I assume is the same as OP's, the dates are of type datetime64[ns] (if note, see notes below) and are the index of the dataframe.

Therefore, one option would be using a list and a set (unordered collection with no duplicate elements) as follows

Dates = list(set(df.index.date.tolist()))

[Out]: [datetime.date(2022, 9, 21), datetime.date(2022, 3, 31)]

Then, if one wants to obtain the output as a list of strings, one can do the following

Dates = [str(date) for date in Dates]

[Out]: ['2022-09-21', '2022-03-31']

A one-liner to obtain the desired output would be like this

Dates = [str(date) for date in list(set(df.index.date.tolist()))]

[Out]: ['2022-09-21', '2022-03-31']

Even though it was not specified, if one wants the dates sorted, then use sorted() as follows

Dates = sorted([str(date) for date in list(set(df.index.date.tolist()))])

[Out]: ['2022-03-31', '2022-09-21']

Option 2

Even though OP doesn't want to use additional libraries, instead of list() and set() (and sorted()), one can use numpy.unique. This method returns a sorted unique elements of an array. It would be as follows

import numpy as np

Dates = [str(date) for date in np.unique(df.index.date).tolist()]

[Out]: ['2022-03-31', '2022-09-21']

Notes:

If the index is not of datetime, one can do the following
```
df.index = pd.to_datetime(df.index)
```

If one wants to specify different date formats, such as minutes, one would have to use pandas.DatetimeIndex.strftime with specific python string formats as follows

Dates = sorted([str(date) for date in list(set(df.index.strftime('%Y-%m-%d %H:%M').tolist()))])

[Out]: ['2022-03-31 09:30', '2022-03-31 09:35', '2022-03-31 09:40', '2022-03-31 09:45', '2022-03-31 09:50', '2022-09-21 15:35', '2022-09-21 15:40', '2022-09-21 15:45', '2022-09-21 15:50', '2022-09-21 15:55']

CodePudding user response：

There are quite a few ways to solve this, technically the easiest is to reset the index and do l = list(df.reset_index().Date.apply(lambda x:x.date()).unique())

You do not have to modify the dataframe object to do this.

You could also just do something like:

l = []
for x in df.index:
    if x not in l: l.append(x)

Here is an example:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.zeros((10,10)))
list(df.reset_index()[0].apply(lambda x:x).unique())

CodePudding user response：

here is one way to do it Assumption: you're looking for unique dates, without time

dates = sorted(list(set(pd.to_datetime(df.index).strftime('%Y-%m-%d'))))
dates

['2022-09-21', '2022-03-31']

CodePudding user response：

You could use the unique() method on any dataframe column you desire, it returns a list of the unique values given a certain column:

Dates = df["Date"].unique()