If I were to have a dataframe example as follows:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(15, 4)), columns=list('ABCD'))
A B C D
0 91 96 36 89
1 17 18 40 97
2 38 12 22 63
3 38 13 17 96
4 48 68 65 59
5 45 28 65 79
6 49 73 36 20
7 6 19 11 87
8 90 19 49 74
9 93 35 97 55
10 28 80 27 40
11 74 42 14 26
12 81 12 28 53
13 63 63 60 61
14 10 54 39 23
And I wanted to split it into a list of equal size dataframes with a staggered delay that increases each time, as in:
A B C D
0 91 96 36 89
1 17 18 40 97
2 38 12 22 63
3 38 13 17 96
A B C D
5 45 28 65 79
6 49 73 36 20
7 6 19 11 87
8 90 19 49 74
A B C D
11 74 42 14 26
12 81 12 28 53
13 63 63 60 61
14 10 54 39 23
What would be an elegant way of doing so? I am envisioning creating an extra column with a certain value at the rows in which I would like to make the splits, but this seems a bit clunky and kind of hack-job-y. Any ideas?
Thank you.
CodePudding user response:
The so called "delay" is given by the counter in this example
num_rows = 4
n = len(df) // num_rows
dfs = []
counter = 0
for i in range(n):
counter = i
start = num_rows * i counter
_df = df.loc[start:start num_rows-1]
dfs.append(_df)
dfs