This is snippet of the dataframe I am using:
type date time open close change high low 200ema 50ema
0 sixty-min 2007-06-04 09:00:00 1536.28 1534.71 -0.102 0.000 -0.259 NaN 1522.90
1 sixty-min 2007-06-04 10:00:00 1534.87 1534.79 -0.005 0.109 -0.106 NaN 1523.37
2 sixty-min 2007-06-04 11:00:00 1534.88 1536.08 0.078 0.124 -0.023 NaN 1523.87
3 sixty-min 2007-06-04 12:00:00 1536.21 1537.30 0.071 0.118 -0.036 NaN 1524.39
4 sixty-min 2007-06-04 13:00:00 1537.31 1536.23 -0.070 0.011 -0.130 NaN 1524.86
5 sixty-min 2007-06-04 14:00:00 1536.25 1536.91 0.043 0.096 -0.078 NaN 1525.33
6 sixty-min 2007-06-04 15:00:00 1536.53 1539.10 0.167 0.260 0.000 NaN 1525.87
7 sixty-min 2007-06-04 16:00:00 1539.00 1539.18 0.012 0.012 0.000 NaN 1526.39
8 sixty-min 2007-06-05 09:00:00 1539.12 1533.15 -0.389 0.000 -0.456 NaN 1526.66
9 sixty-min 2007-06-05 10:00:00 1533.16 1534.77 0.105 0.160 -0.178 NaN 1526.97
What I would like to do is to compile this dataframe into a dataframe with just one row. It would have the following columns:
[ 'date' '60 9 open,' '60 9 close,' '60 9 change,' '60 9 high', '60 9 low', '60 9 200ema', '60 9 50ema',
'60 10 open', '60 10 close', '60 10 change', '60 10 high', '60 10 low', '60 10 200ema', '60 10 50ema',
'60 11 open', '60 11 close', '60 11 change', '60 11 high', '60 11 low', '60 11 200ema', '60 11 50ema',
'60 12 open', '60 12 close', '60 12 change', '60 12 high', '60 12 low', '60 12 200ema', '60 12 50ema',
'60 13 open', '60 13 close', '60 13 change', '60 13 high', '60 13 low', '60 13 200ema', '60 13 50ema',
'60 14 open', '60 14 close', '60 14 change', '60 14 high', '60 14 low', '60 14 200ema', '60 14 50ema',
'60 15 open', '60 15 close', '60 15 change', '60 15 high', '60 15 low', '60 15 200ema', '60 15 50ema',
'60 16 open', '60 16 close', '60 16 change', '60 16 high', '60 16 low', '60 16 200ema', '60 16 50ema',]
The difference would be that there is just one date on the row and no type, and there is a heading based on the data type/ time of each cell.
CodePudding user response:
You can first extract the hour as int and group by date:
df['time'] = pd.to_datetime(df['time']).dt.hour
df = df.groupby('date').agg(list)
Then for each date concatenate (along column/axis1) dataframes created from each column. Finally concatenate (along rows/axis0) the dataframes for all dates:
df_out = pd.concat([
pd.concat([pd.DataFrame([row[col]], index=[index],
columns=[f'60 {h} {col}' for h in row['time']])
for col in row.index[1:]], axis=1)
for index, row in df.iterrows()
])
Output:
60 9 open 60 10 open 60 11 open 60 12 open 60 13 open 60 14 open 60 15 open ... 60 10 50ema 60 11 50ema 60 12 50ema 60 13 50ema 60 14 50ema 60 15 50ema 60 16 50ema
2007-06-04 1536.28 1534.87 1534.88 1536.21 1537.31 1536.25 1536.53 ... 1523.37 1523.87 1524.39 1524.86 1525.33 1525.87 1526.39
2007-06-05 1539.12 1533.16 NaN NaN NaN NaN NaN ... 1526.97 NaN NaN NaN NaN NaN NaN