I have records in a spreadsheet and I need to partition group by ID_CLI.

My code is

Import pandas as pd

data = pd.read_excel('ReadSheet.xlsx')
step = 5
count = 0

for i in range(0, len(data), step):
    df = data[i:i   step]
    count  = 1

    df.to_excel('SpreadSheet_' count '.xlsx')

I need read the step in next position of the ID_CLI (step 6), and if is equal to the current one (step 5) it iterates 1 in my step, otherwise step = 5. Repeat in all records of my SpreadSheet keeping my original step = 5, and Iterates if next (ID_CLI) is equal current one

My Original SpreadSheet

I need some condicition like this:

if ID_CLI[step   1] == ID_CLI[step]:
      step  = 1                
else:
      step = step

POSITION    ID_CLI  NAME_CLI
1           1       Bruce Wayne
2           1       Bruce Wayne
3           2       Prakash Dahal
4           2       Prakash Dahal
5           3       Peter Parker
6           3       Peter Parker
7           4       Black Panther
8           4       Black Panther
9           4       Black Panther
10          4       Black Panther
11          5       Tony Stark
12          6       Doctor Strange
13          6       Doctor Strange
14          6       Doctor Strange
15          6       Doctor Strange
16          6       Doctor Strange
17          6       Doctor Strange
18          6       Doctor Strange
19          7       Ciclops
20          7       Ciclops
21          7       Ciclops
22          7       Ciclops
23          7       Ciclops
24          8       Nocturn

OUTPUT

        
**SpreadSheet_1.xlsx**      
POSITION    ID_CLI  NAME_CLI
1           1       Bruce Wayne
2           1       Bruce Wayne
3           2       Prakash Dahal
4           2       Prakash Dahal
5           3       Peter Parker # step = 5 and ID_CLI Equals next ID_CLI 
6           3       Peter Parker # step = 6 and ID_CLI Different next ID_CLI from step 7


**SpreadSheet_2.xlsx**      
POSITION    ID_CLI  NAME_CLI
7           4       Black Panther
8           4       Black Panther
9           4       Black Panther
10          4       Black Panther
11          5       Tony Stark # step = 5 and ID_CLI Different next ID_CLI from step 6


**SpreadSheet_3.xlsx**      
POSITION    ID_CLI  NAME_CLI
12          6       Doctor Strange
13          6       Doctor Strange
14          6       Doctor Strange
15          6       Doctor Strange
16          6       Doctor Strange # step = 5 and ID_CLI Equals next ID_CLI from step 6
17          6       Doctor Strange # step = 6 and ID_CLI Equals next ID_CLI from step 7
18          6       Doctor Strange # step = 7 and ID_CLI Different next ID_CLI from step 8


**SpreadSheet_4.xlsx**      
POSITION    ID_CLI  NAME_CLI
19          7       Ciclops
20          7       Ciclops
21          7       Ciclops
22          7       Ciclops
23          7       Ciclops # step = 5 and ID_CLI Different next ID_CLI from step 6


**SpreadSheet_4.xlsx**
POSITION    ID_CLI  NAME_CLI
24          8       Nocturn # End Records

CodePudding user response：

This should work.

"""
- creates chunk for size of 5
- if duplicate data is in 5th place and 6th place, it will increase chunk size until the final duplicate data place is found and creates its chunk.
"""

step = 5
a = 0

df = pd.read_excel('sample.xlsx')
df['POSITION'] = df['POSITION'].astype(int)
df['ID_CLI'] = df['ID_CLI'].astype(int)
df['POS_INDEX'] = df['POSITION']
df = df.set_index('POS_INDEX')

while True:
    if step >= len(df):
        a  = 1
        df.loc[1:step].to_excel(f'SpreadSheet_{a}.xlsx', index=False)
        break
        
    if df.loc[step,'NAME_CLI'] == df.loc[step 1,'NAME_CLI']:
        while df.loc[step,'NAME_CLI'] == df.loc[step 1,'NAME_CLI']:
            step  = 1
        a  = 1
        df.loc[1:step].to_excel(f'SpreadSheet_{a}.xlsx', index=False)
        df.drop(list(range(1,step 1)), axis=0,inplace=True) #droping chunk to create new df
        df.index = list(range(1,len(df) 1))
        df.index.name = 'POS_INDEX'
        step=5
    else:
        step = 5
        a  = 1
        df.loc[1:step].to_excel(f'SpreadSheet_{a}.xlsx', index=False)
        df.drop(list(range(1,step 1)), axis=0,inplace=True)
        df.index = list(range(1,len(df) 1))
        df.index.name = 'POS_INDEX'