Home > Back-end >  Convert MultiIndex to DatetimeIndex in grouped dataframe
Convert MultiIndex to DatetimeIndex in grouped dataframe

Time:12-24

I have a pandas DataFrameGroupBy (df_groups) that I have created by grouping another dataframe (df_pub) containing a list of publications by their day/month/year index.

df_groups = df_pub.groupby(by=df_pub.index.day, df_pub.index.month,df_pub.index.year],sort=False)

I then want to check how many unique publications exist in each group, so I use:

n_unique_pub = df_groups.Title.nunique()

This is a pandas series with a MutiIndex that looks like the following:

MultiIndex([( 1,  7, 2020),
            ( 2,  7, 2020),
            ( 3,  7, 2020),
            ( 4,  7, 2020),
            ( 5,  7, 2020),
            ( 6,  7, 2020),
            ( 7,  7, 2020),
            ( 8,  7, 2020),
            ( 9,  7, 2020),
            (10,  7, 2020),
            ...
            ( 8, 11, 2021),
            ( 9, 11, 2021),
            (10, 11, 2021),
            (11, 11, 2021),
            (12, 11, 2021),
            (13, 11, 2021),
            (14, 11, 2021),
            (15, 11, 2021),
            (16, 11, 2021),
            (17, 11, 2021)],
           names=['Date', 'Date', 'Date'], length=497)

I would like to convert this MultiIndex into a DatetimeIndex so that it would look like:

DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
               '2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08',
               '2020-07-09', '2020-07-10',
               ...
               '2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11',
               '2021-11-12', '2021-11-13', '2021-11-14', '2021-11-15',
               '2021-11-16', '2021-11-17'],
              dtype='datetime64[ns]', name='Date', length=505, freq='D')

Is there a simple way to do it? So far I've tried several approaches but nothing has worked. For example if I do pd.to_datetime(n_unique_pub.index) I have an error: TypeError: <class 'tuple'> is not convertible to datetime.

CodePudding user response:

Use pd.to_datetime:

# mi is your MultiIndex instance, like mi = df.index
>>> pd.DatetimeIndex(pd.to_datetime(mi.rename(['day', 'month', 'year']).to_frame()))

DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
               '2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08',
               '2020-07-09', '2020-07-10', '2021-11-08', '2021-11-09',
               '2021-11-10', '2021-11-11', '2021-11-12', '2021-11-13',
               '2021-11-14', '2021-11-15', '2021-11-16', '2021-11-17'],
              dtype='datetime64[ns]', freq=None)

How to replace a MultiIndex to DatetimeIndex:

idx = pd.to_datetime(df.index.rename(['day', 'month', 'year']).to_frame())
df = df.set_index(idx)
print(df)

# Output:
                   A
2020-07-01  0.961038
2020-07-02  0.098132
2020-07-03  0.406996
2020-07-04  0.008376
2020-07-05  0.568059
2020-07-06  0.576610
2020-07-07  0.137144
2020-07-08  0.672219
2020-07-09  0.142874
2020-07-10  0.509231
2021-11-08  0.368762
2021-11-09  0.249107
2021-11-10  0.136282
2021-11-11  0.119291
2021-11-12  0.052388
2021-11-13  0.434899
2021-11-14  0.770705
2021-11-15  0.850914
2021-11-16  0.621283
2021-11-17  0.379888

CodePudding user response:

You can convert to 'YYYY-MM-DD' format first:

idx = pd.MultiIndex.from_tuples(
    [( 1,  7, 2020),
     ( 2,  7, 2020),]
    )

pd.to_datetime(idx.map(lambda x: '-'.join(map(str, reversed(x)))))

Output:

DatetimeIndex(['2020-07-01', '2020-07-02'], dtype='datetime64[ns]', freq=None)

CodePudding user response:

something like should work

dates = pd.to_datetime(df_groups.reset_index()[['year','month','day']]) 
  • Related