I have a data frame that is indexed by DateTimeIndex
at the begining of my code. Which looks like the following:
import pandas as pd
import numpy as np
np.random.seed(0)
timeindex = pd.date_range('8/12/2022', '8/18/2022', freq='min')
df = pd.DataFrame(np.random.randn(len(timeindex)), columns=['value'], index=timeindex)
print(df.head())
value
2022-08-12 00:00:00 1.764052
2022-08-12 00:01:00 0.400157
2022-08-12 00:02:00 0.978738
2022-08-12 00:03:00 2.240893
2022-08-12 00:04:00 1.867558
Then I convert it to a dataframe that is index using time (axis=0) and date as the columns (axis=1) and do some analysis afterwards. It looks like the following:
df = df.set_index([df.index.date, df.index.time]).unstack(level=0)
print(df.head())
value
2022-08-12 2022-08-13 2022-08-14 2022-08-15 2022-08-16 2022-08-17
00:00:00 1.764052 0.147434 -0.691020 0.718456 1.198187 -0.088724
00:01:00 0.400157 -0.977465 -2.205505 1.576193 -0.336650 1.512770
00:02:00 0.978738 0.879390 0.447870 0.441869 0.841947 0.573708
00:03:00 2.240893 0.635425 -0.755751 -0.816861 0.543372 -0.541004
00:04:00 1.867558 0.542611 1.325708 0.745505 -0.962988 0.101177
Now, I want to convert my dataframe back to the original form as the first one. How can I do that? (I know if I read my dateframe again I will get it as the first form but I do not want to do it.)
My try I use the following but I do not know how to combine date and time:
df.unstack()
value 2022-08-12 00:00:00 1.764052
00:01:00 0.400157
00:02:00 0.978738
00:03:00 2.240893
00:04:00 1.867558
CodePudding user response:
df2 = df.unstack()
df2.index = df.unstack().index.droplevel(0).map(lambda x: f'{x[0]} {x[1]}')
df2 = df2.dropna()
df2
2022-08-12 00:00:00 1.764052
2022-08-12 00:01:00 0.400157
2022-08-12 00:02:00 0.978738
2022-08-12 00:03:00 2.240893
2022-08-12 00:04:00 1.867558
...
2022-08-17 23:56:00 -0.528628
2022-08-17 23:57:00 -0.561362
2022-08-17 23:58:00 0.459419
2022-08-17 23:59:00 0.461485
2022-08-18 00:00:00 -0.841556
CodePudding user response:
You can use pd.index.get_level_values
to concatenate your indexes and columns into their old form
new_df = df.unstack()
new_df.index = [new_df.index.get_level_values(1), new_df.index.map('{0[1]} {0[2]}'.format)]
However, this will not drop the level 0 multi-index, So you now have
2022-08-12 2022-08-12 00:00:00 1.764052
2022-08-12 00:01:00 0.400157
2022-08-12 00:02:00 0.978738
2022-08-12 00:03:00 2.240893
2022-08-12 00:04:00 1.867558
...
2022-08-18 2022-08-18 23:55:00 NaN
2022-08-18 23:56:00 NaN
2022-08-18 23:57:00 NaN
2022-08-18 23:58:00 NaN
2022-08-18 23:59:00 NaN
You can drop multi-index level with index.droplevel
new_df.index = new_df.index.droplevel(0)
Lastly, you will see you will have NaN
values at the end of the series as expected. These NaN
came when you applied df.unstack
at the beginning. You can either enter the last date if you know it, or just pass
new_df[new_df.index <= first_df.index[-1]]
Note that I intentionally called it first_df
. first_df
is simply the dataframe at the beginning that didn't go under any transformation