I have a pandas DataFrameGroupBy (df_groups) that I have created by grouping another dataframe (df_pub) containing a list of publications by their day/month/year index.
df_groups = df_pub.groupby(by=df_pub.index.day, df_pub.index.month,df_pub.index.year],sort=False)
I then want to check how many unique publications exist in each group, so I use:
n_unique_pub = df_groups.Title.nunique()
This is a pandas series with a MutiIndex that looks like the following:
MultiIndex([( 1, 7, 2020),
( 2, 7, 2020),
( 3, 7, 2020),
( 4, 7, 2020),
( 5, 7, 2020),
( 6, 7, 2020),
( 7, 7, 2020),
( 8, 7, 2020),
( 9, 7, 2020),
(10, 7, 2020),
...
( 8, 11, 2021),
( 9, 11, 2021),
(10, 11, 2021),
(11, 11, 2021),
(12, 11, 2021),
(13, 11, 2021),
(14, 11, 2021),
(15, 11, 2021),
(16, 11, 2021),
(17, 11, 2021)],
names=['Date', 'Date', 'Date'], length=497)
I would like to convert this MultiIndex into a DatetimeIndex so that it would look like:
DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
'2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08',
'2020-07-09', '2020-07-10',
...
'2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11',
'2021-11-12', '2021-11-13', '2021-11-14', '2021-11-15',
'2021-11-16', '2021-11-17'],
dtype='datetime64[ns]', name='Date', length=505, freq='D')
Is there a simple way to do it? So far I've tried several approaches but nothing has worked. For example if I do pd.to_datetime(n_unique_pub.index)
I have an error: TypeError: <class 'tuple'> is not convertible to datetime
.
CodePudding user response:
Use pd.to_datetime
:
# mi is your MultiIndex instance, like mi = df.index
>>> pd.DatetimeIndex(pd.to_datetime(mi.rename(['day', 'month', 'year']).to_frame()))
DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
'2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08',
'2020-07-09', '2020-07-10', '2021-11-08', '2021-11-09',
'2021-11-10', '2021-11-11', '2021-11-12', '2021-11-13',
'2021-11-14', '2021-11-15', '2021-11-16', '2021-11-17'],
dtype='datetime64[ns]', freq=None)
How to replace a MultiIndex to DatetimeIndex:
idx = pd.to_datetime(df.index.rename(['day', 'month', 'year']).to_frame())
df = df.set_index(idx)
print(df)
# Output:
A
2020-07-01 0.961038
2020-07-02 0.098132
2020-07-03 0.406996
2020-07-04 0.008376
2020-07-05 0.568059
2020-07-06 0.576610
2020-07-07 0.137144
2020-07-08 0.672219
2020-07-09 0.142874
2020-07-10 0.509231
2021-11-08 0.368762
2021-11-09 0.249107
2021-11-10 0.136282
2021-11-11 0.119291
2021-11-12 0.052388
2021-11-13 0.434899
2021-11-14 0.770705
2021-11-15 0.850914
2021-11-16 0.621283
2021-11-17 0.379888
CodePudding user response:
You can convert to 'YYYY-MM-DD' format first:
idx = pd.MultiIndex.from_tuples(
[( 1, 7, 2020),
( 2, 7, 2020),]
)
pd.to_datetime(idx.map(lambda x: '-'.join(map(str, reversed(x)))))
Output:
DatetimeIndex(['2020-07-01', '2020-07-02'], dtype='datetime64[ns]', freq=None)
CodePudding user response:
something like should work
dates = pd.to_datetime(df_groups.reset_index()[['year','month','day']])