Home > Blockchain >  Convert unstacked multi indexed dataframe to a single index as DateTimeIndex
Convert unstacked multi indexed dataframe to a single index as DateTimeIndex

Time:08-15

I have a data frame that is indexed by DateTimeIndex at the begining of my code. Which looks like the following:

import pandas as pd
import numpy as np
np.random.seed(0)
timeindex = pd.date_range('8/12/2022', '8/18/2022', freq='min')
df = pd.DataFrame(np.random.randn(len(timeindex)), columns=['value'], index=timeindex)
print(df.head())
                        value
2022-08-12 00:00:00  1.764052
2022-08-12 00:01:00  0.400157
2022-08-12 00:02:00  0.978738
2022-08-12 00:03:00  2.240893
2022-08-12 00:04:00  1.867558

Then I convert it to a dataframe that is index using time (axis=0) and date as the columns (axis=1) and do some analysis afterwards. It looks like the following:

df = df.set_index([df.index.date, df.index.time]).unstack(level=0)
print(df.head())
             value                                                         
         2022-08-12 2022-08-13 2022-08-14 2022-08-15 2022-08-16 2022-08-17   
00:00:00   1.764052   0.147434  -0.691020   0.718456   1.198187  -0.088724   
00:01:00   0.400157  -0.977465  -2.205505   1.576193  -0.336650   1.512770   
00:02:00   0.978738   0.879390   0.447870   0.441869   0.841947   0.573708   
00:03:00   2.240893   0.635425  -0.755751  -0.816861   0.543372  -0.541004   
00:04:00   1.867558   0.542611   1.325708   0.745505  -0.962988   0.101177   

Now, I want to convert my dataframe back to the original form as the first one. How can I do that? (I know if I read my dateframe again I will get it as the first form but I do not want to do it.)

My try I use the following but I do not know how to combine date and time:

df.unstack()
value  2022-08-12  00:00:00    1.764052
                   00:01:00    0.400157
                   00:02:00    0.978738
                   00:03:00    2.240893
                   00:04:00    1.867558

CodePudding user response:

df2 =  df.unstack()
df2.index = df.unstack().index.droplevel(0).map(lambda x: f'{x[0]} {x[1]}')
df2 = df2.dropna()
df2

2022-08-12 00:00:00    1.764052
2022-08-12 00:01:00    0.400157
2022-08-12 00:02:00    0.978738
2022-08-12 00:03:00    2.240893
2022-08-12 00:04:00    1.867558
                         ...   
2022-08-17 23:56:00   -0.528628
2022-08-17 23:57:00   -0.561362
2022-08-17 23:58:00    0.459419
2022-08-17 23:59:00    0.461485
2022-08-18 00:00:00   -0.841556

CodePudding user response:

You can use pd.index.get_level_values to concatenate your indexes and columns into their old form

new_df =  df.unstack()
new_df.index = [new_df.index.get_level_values(1), new_df.index.map('{0[1]} {0[2]}'.format)]

However, this will not drop the level 0 multi-index, So you now have

2022-08-12  2022-08-12 00:00:00    1.764052
            2022-08-12 00:01:00    0.400157
            2022-08-12 00:02:00    0.978738
            2022-08-12 00:03:00    2.240893
            2022-08-12 00:04:00    1.867558
                                 ...   
2022-08-18  2022-08-18 23:55:00         NaN
            2022-08-18 23:56:00         NaN
            2022-08-18 23:57:00         NaN
            2022-08-18 23:58:00         NaN
            2022-08-18 23:59:00         NaN

You can drop multi-index level with index.droplevel

new_df.index = new_df.index.droplevel(0)

Lastly, you will see you will have NaN values at the end of the series as expected. These NaN came when you applied df.unstack at the beginning. You can either enter the last date if you know it, or just pass

new_df[new_df.index <= first_df.index[-1]]

Note that I intentionally called it first_df. first_df is simply the dataframe at the beginning that didn't go under any transformation

  • Related