Home > Enterprise >  move 4 columns to one column, sort only index, not value
move 4 columns to one column, sort only index, not value

Time:05-02

I have OHLC dataseries, and I'd like to move to 'one' column series to sorting OHLC also.

Datas:

                 Time      Open      High       Low     Close    Volume
0 2022-04-30 13:05:00  38580.46  38610.44  38580.46  38610.44  33.14334
1 2022-04-30 13:06:00  38610.43  38610.44  38600.00  38600.00  10.64336
2 2022-04-30 13:07:00  38600.00  38604.17  38600.00  38604.16  11.41531
3 2022-04-30 13:08:00  38604.16  38607.97  38604.16  38604.16   9.25056
4 2022-04-30 13:09:00  38604.16  38606.81  38604.16  38606.80   6.83944
5 2022-04-30 13:10:00  38606.81  38606.81  38602.00  38602.00   8.55118
6 2022-04-30 13:11:00  38602.01  38602.01  38602.00  38602.01   6.45035
7 2022-04-30 13:12:00  38602.01  38602.01  38600.28  38600.28   5.33661
8 2022-04-30 13:13:00  38600.28  38606.80  38600.28  38606.80  11.11430
9 2022-04-30 13:14:00  38606.80  38606.80  38600.56  38600.57   9.40830

I did it with for loop, but I think this very slow on big data.

df1 = pd.DataFrame(ohlcv, columns = ['Time', 'Open', 'High', 'Low', 'Close', 'Volume'])
df_tick=pd.DataFrame(columns=['Time', 'Price', 'Volume'])

for i in range(0,len(df1)):
    df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['Open'][i]}])],ignore_index=True)
    df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['High'][i]}])],ignore_index=True)
    df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['Low'][i]}])],ignore_index=True)
    df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['Close'][i]}])], ignore_index=True)

The destination data looks like this:

                   Time     Price Volume
0   2022-04-30 13:05:00  38580.46    NaN
1   2022-04-30 13:05:00  38610.44    NaN
2   2022-04-30 13:05:00  38580.46    NaN
3   2022-04-30 13:05:00  38610.44    NaN
4   2022-04-30 13:06:00  38610.43    NaN
5   2022-04-30 13:06:00  38610.44    NaN
6   2022-04-30 13:06:00   38600.0    NaN
7   2022-04-30 13:06:00   38600.0    NaN
8   2022-04-30 13:07:00   38600.0    NaN
9   2022-04-30 13:07:00  38604.17    NaN
10  2022-04-30 13:07:00   38600.0    NaN
11  2022-04-30 13:07:00  38604.16    NaN
12  2022-04-30 13:08:00  38604.16    NaN
13  2022-04-30 13:08:00  38607.97    NaN
14  2022-04-30 13:08:00  38604.16    NaN
15  2022-04-30 13:08:00  38604.16    NaN
16  2022-04-30 13:09:00  38604.16    NaN
17  2022-04-30 13:09:00  38606.81    NaN
18  2022-04-30 13:09:00  38604.16    NaN

I tried with simple concat, but sorting with index, soring values also.

df_tick=pd.concat([df1['Open'], df1['High'],df1['Low'],df1['Close']], axis=0)
df_tick=df_tick.sort_index()

How to sorting only index in the order of entry?

CodePudding user response:

I think there is no need to use for-loop, you can use stack to simplify your code plus stack also preserve the order of OHLC prices so there is no need to sort afterwards:

df.set_index(['Time', 'Volume']).stack().droplevel(-1).reset_index(name='Price')

Result

                   Time    Volume     Price
0   2022-04-30 13:05:00  33.14334  38580.46
1   2022-04-30 13:05:00  33.14334  38610.44
2   2022-04-30 13:05:00  33.14334  38580.46
3   2022-04-30 13:05:00  33.14334  38610.44
4   2022-04-30 13:06:00  10.64336  38610.43
5   2022-04-30 13:06:00  10.64336  38610.44
6   2022-04-30 13:06:00  10.64336  38600.00
7   2022-04-30 13:06:00  10.64336  38600.00
8   2022-04-30 13:07:00  11.41531  38600.00
9   2022-04-30 13:07:00  11.41531  38604.17
10  2022-04-30 13:07:00  11.41531  38600.00
11  2022-04-30 13:07:00  11.41531  38604.16
12  2022-04-30 13:08:00   9.25056  38604.16
13  2022-04-30 13:08:00   9.25056  38607.97
14  2022-04-30 13:08:00   9.25056  38604.16
15  2022-04-30 13:08:00   9.25056  38604.16
16  2022-04-30 13:09:00   6.83944  38604.16
17  2022-04-30 13:09:00   6.83944  38606.81
18  2022-04-30 13:09:00   6.83944  38604.16
...
38  2022-04-30 13:14:00   9.40830  38600.56
39  2022-04-30 13:14:00   9.40830  38600.57

CodePudding user response:

Taking a cue from @shubhamsharma's explanation, one option where you do not need to convert to categorical dtype, and maintain the order is with pivot_longer from pyjanitor

# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(index = ['Time', 'Volume'],
              values_to = 'Price',
              sort_by_appearance=True)
.drop(columns='variable')
)
                   Time    Volume     Price
0   2022-04-30 13:05:00  33.14334  38580.46
1   2022-04-30 13:05:00  33.14334  38610.44
2   2022-04-30 13:05:00  33.14334  38580.46
3   2022-04-30 13:05:00  33.14334  38610.44
4   2022-04-30 13:06:00  10.64336  38610.43
5   2022-04-30 13:06:00  10.64336  38610.44
6   2022-04-30 13:06:00  10.64336  38600.00
7   2022-04-30 13:06:00  10.64336  38600.00
8   2022-04-30 13:07:00  11.41531  38600.00
9   2022-04-30 13:07:00  11.41531  38604.17
10  2022-04-30 13:07:00  11.41531  38600.00
11  2022-04-30 13:07:00  11.41531  38604.16
12  2022-04-30 13:08:00   9.25056  38604.16
13  2022-04-30 13:08:00   9.25056  38607.97
14  2022-04-30 13:08:00   9.25056  38604.16
15  2022-04-30 13:08:00   9.25056  38604.16
16  2022-04-30 13:09:00   6.83944  38604.16
17  2022-04-30 13:09:00   6.83944  38606.81
18  2022-04-30 13:09:00   6.83944  38604.16
19  2022-04-30 13:09:00   6.83944  38606.80
20  2022-04-30 13:10:00   8.55118  38606.81
21  2022-04-30 13:10:00   8.55118  38606.81
22  2022-04-30 13:10:00   8.55118  38602.00
23  2022-04-30 13:10:00   8.55118  38602.00
24  2022-04-30 13:11:00   6.45035  38602.01
25  2022-04-30 13:11:00   6.45035  38602.01
26  2022-04-30 13:11:00   6.45035  38602.00
27  2022-04-30 13:11:00   6.45035  38602.01
28  2022-04-30 13:12:00   5.33661  38602.01
29  2022-04-30 13:12:00   5.33661  38602.01
30  2022-04-30 13:12:00   5.33661  38600.28
31  2022-04-30 13:12:00   5.33661  38600.28
32  2022-04-30 13:13:00  11.11430  38600.28
33  2022-04-30 13:13:00  11.11430  38606.80
34  2022-04-30 13:13:00  11.11430  38600.28
35  2022-04-30 13:13:00  11.11430  38606.80
36  2022-04-30 13:14:00   9.40830  38606.80
37  2022-04-30 13:14:00   9.40830  38606.80
38  2022-04-30 13:14:00   9.40830  38600.56
39  2022-04-30 13:14:00   9.40830  38600.57
  • Related