I have OHLC dataseries, and I'd like to move to 'one' column series to sorting OHLC also.
Datas:
Time Open High Low Close Volume
0 2022-04-30 13:05:00 38580.46 38610.44 38580.46 38610.44 33.14334
1 2022-04-30 13:06:00 38610.43 38610.44 38600.00 38600.00 10.64336
2 2022-04-30 13:07:00 38600.00 38604.17 38600.00 38604.16 11.41531
3 2022-04-30 13:08:00 38604.16 38607.97 38604.16 38604.16 9.25056
4 2022-04-30 13:09:00 38604.16 38606.81 38604.16 38606.80 6.83944
5 2022-04-30 13:10:00 38606.81 38606.81 38602.00 38602.00 8.55118
6 2022-04-30 13:11:00 38602.01 38602.01 38602.00 38602.01 6.45035
7 2022-04-30 13:12:00 38602.01 38602.01 38600.28 38600.28 5.33661
8 2022-04-30 13:13:00 38600.28 38606.80 38600.28 38606.80 11.11430
9 2022-04-30 13:14:00 38606.80 38606.80 38600.56 38600.57 9.40830
I did it with for loop, but I think this very slow on big data.
df1 = pd.DataFrame(ohlcv, columns = ['Time', 'Open', 'High', 'Low', 'Close', 'Volume'])
df_tick=pd.DataFrame(columns=['Time', 'Price', 'Volume'])
for i in range(0,len(df1)):
df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['Open'][i]}])],ignore_index=True)
df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['High'][i]}])],ignore_index=True)
df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['Low'][i]}])],ignore_index=True)
df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['Close'][i]}])], ignore_index=True)
The destination data looks like this:
Time Price Volume
0 2022-04-30 13:05:00 38580.46 NaN
1 2022-04-30 13:05:00 38610.44 NaN
2 2022-04-30 13:05:00 38580.46 NaN
3 2022-04-30 13:05:00 38610.44 NaN
4 2022-04-30 13:06:00 38610.43 NaN
5 2022-04-30 13:06:00 38610.44 NaN
6 2022-04-30 13:06:00 38600.0 NaN
7 2022-04-30 13:06:00 38600.0 NaN
8 2022-04-30 13:07:00 38600.0 NaN
9 2022-04-30 13:07:00 38604.17 NaN
10 2022-04-30 13:07:00 38600.0 NaN
11 2022-04-30 13:07:00 38604.16 NaN
12 2022-04-30 13:08:00 38604.16 NaN
13 2022-04-30 13:08:00 38607.97 NaN
14 2022-04-30 13:08:00 38604.16 NaN
15 2022-04-30 13:08:00 38604.16 NaN
16 2022-04-30 13:09:00 38604.16 NaN
17 2022-04-30 13:09:00 38606.81 NaN
18 2022-04-30 13:09:00 38604.16 NaN
I tried with simple concat, but sorting with index, soring values also.
df_tick=pd.concat([df1['Open'], df1['High'],df1['Low'],df1['Close']], axis=0)
df_tick=df_tick.sort_index()
How to sorting only index in the order of entry?
CodePudding user response:
I think there is no need to use for-loop, you can use stack
to simplify your code plus stack
also preserve the order of OHLC
prices so there is no need to sort afterwards:
df.set_index(['Time', 'Volume']).stack().droplevel(-1).reset_index(name='Price')
Result
Time Volume Price
0 2022-04-30 13:05:00 33.14334 38580.46
1 2022-04-30 13:05:00 33.14334 38610.44
2 2022-04-30 13:05:00 33.14334 38580.46
3 2022-04-30 13:05:00 33.14334 38610.44
4 2022-04-30 13:06:00 10.64336 38610.43
5 2022-04-30 13:06:00 10.64336 38610.44
6 2022-04-30 13:06:00 10.64336 38600.00
7 2022-04-30 13:06:00 10.64336 38600.00
8 2022-04-30 13:07:00 11.41531 38600.00
9 2022-04-30 13:07:00 11.41531 38604.17
10 2022-04-30 13:07:00 11.41531 38600.00
11 2022-04-30 13:07:00 11.41531 38604.16
12 2022-04-30 13:08:00 9.25056 38604.16
13 2022-04-30 13:08:00 9.25056 38607.97
14 2022-04-30 13:08:00 9.25056 38604.16
15 2022-04-30 13:08:00 9.25056 38604.16
16 2022-04-30 13:09:00 6.83944 38604.16
17 2022-04-30 13:09:00 6.83944 38606.81
18 2022-04-30 13:09:00 6.83944 38604.16
...
38 2022-04-30 13:14:00 9.40830 38600.56
39 2022-04-30 13:14:00 9.40830 38600.57
CodePudding user response:
Taking a cue from @shubhamsharma's explanation, one option where you do not need to convert to categorical dtype, and maintain the order is with pivot_longer from pyjanitor
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(index = ['Time', 'Volume'],
values_to = 'Price',
sort_by_appearance=True)
.drop(columns='variable')
)
Time Volume Price
0 2022-04-30 13:05:00 33.14334 38580.46
1 2022-04-30 13:05:00 33.14334 38610.44
2 2022-04-30 13:05:00 33.14334 38580.46
3 2022-04-30 13:05:00 33.14334 38610.44
4 2022-04-30 13:06:00 10.64336 38610.43
5 2022-04-30 13:06:00 10.64336 38610.44
6 2022-04-30 13:06:00 10.64336 38600.00
7 2022-04-30 13:06:00 10.64336 38600.00
8 2022-04-30 13:07:00 11.41531 38600.00
9 2022-04-30 13:07:00 11.41531 38604.17
10 2022-04-30 13:07:00 11.41531 38600.00
11 2022-04-30 13:07:00 11.41531 38604.16
12 2022-04-30 13:08:00 9.25056 38604.16
13 2022-04-30 13:08:00 9.25056 38607.97
14 2022-04-30 13:08:00 9.25056 38604.16
15 2022-04-30 13:08:00 9.25056 38604.16
16 2022-04-30 13:09:00 6.83944 38604.16
17 2022-04-30 13:09:00 6.83944 38606.81
18 2022-04-30 13:09:00 6.83944 38604.16
19 2022-04-30 13:09:00 6.83944 38606.80
20 2022-04-30 13:10:00 8.55118 38606.81
21 2022-04-30 13:10:00 8.55118 38606.81
22 2022-04-30 13:10:00 8.55118 38602.00
23 2022-04-30 13:10:00 8.55118 38602.00
24 2022-04-30 13:11:00 6.45035 38602.01
25 2022-04-30 13:11:00 6.45035 38602.01
26 2022-04-30 13:11:00 6.45035 38602.00
27 2022-04-30 13:11:00 6.45035 38602.01
28 2022-04-30 13:12:00 5.33661 38602.01
29 2022-04-30 13:12:00 5.33661 38602.01
30 2022-04-30 13:12:00 5.33661 38600.28
31 2022-04-30 13:12:00 5.33661 38600.28
32 2022-04-30 13:13:00 11.11430 38600.28
33 2022-04-30 13:13:00 11.11430 38606.80
34 2022-04-30 13:13:00 11.11430 38600.28
35 2022-04-30 13:13:00 11.11430 38606.80
36 2022-04-30 13:14:00 9.40830 38606.80
37 2022-04-30 13:14:00 9.40830 38606.80
38 2022-04-30 13:14:00 9.40830 38600.56
39 2022-04-30 13:14:00 9.40830 38600.57