I have records in a spreadsheet and I need to partition group by ID_CLI.
My code is
Import pandas as pd
data = pd.read_excel('ReadSheet.xlsx')
step = 5
count = 0
for i in range(0, len(data), step):
df = data[i:i step]
count = 1
df.to_excel('SpreadSheet_' count '.xlsx')
I need read the step in next position of the ID_CLI (step 6), and if is equal to the current one (step 5) it iterates 1 in my step, otherwise step = 5. Repeat in all records of my SpreadSheet keeping my original step = 5, and Iterates if next (ID_CLI) is equal current one
My Original SpreadSheet
I need some condicition like this:
if ID_CLI[step 1] == ID_CLI[step]:
step = 1
else:
step = step
POSITION ID_CLI NAME_CLI
1 1 Bruce Wayne
2 1 Bruce Wayne
3 2 Prakash Dahal
4 2 Prakash Dahal
5 3 Peter Parker
6 3 Peter Parker
7 4 Black Panther
8 4 Black Panther
9 4 Black Panther
10 4 Black Panther
11 5 Tony Stark
12 6 Doctor Strange
13 6 Doctor Strange
14 6 Doctor Strange
15 6 Doctor Strange
16 6 Doctor Strange
17 6 Doctor Strange
18 6 Doctor Strange
19 7 Ciclops
20 7 Ciclops
21 7 Ciclops
22 7 Ciclops
23 7 Ciclops
24 8 Nocturn
OUTPUT
**SpreadSheet_1.xlsx**
POSITION ID_CLI NAME_CLI
1 1 Bruce Wayne
2 1 Bruce Wayne
3 2 Prakash Dahal
4 2 Prakash Dahal
5 3 Peter Parker # step = 5 and ID_CLI Equals next ID_CLI
6 3 Peter Parker # step = 6 and ID_CLI Different next ID_CLI from step 7
**SpreadSheet_2.xlsx**
POSITION ID_CLI NAME_CLI
7 4 Black Panther
8 4 Black Panther
9 4 Black Panther
10 4 Black Panther
11 5 Tony Stark # step = 5 and ID_CLI Different next ID_CLI from step 6
**SpreadSheet_3.xlsx**
POSITION ID_CLI NAME_CLI
12 6 Doctor Strange
13 6 Doctor Strange
14 6 Doctor Strange
15 6 Doctor Strange
16 6 Doctor Strange # step = 5 and ID_CLI Equals next ID_CLI from step 6
17 6 Doctor Strange # step = 6 and ID_CLI Equals next ID_CLI from step 7
18 6 Doctor Strange # step = 7 and ID_CLI Different next ID_CLI from step 8
**SpreadSheet_4.xlsx**
POSITION ID_CLI NAME_CLI
19 7 Ciclops
20 7 Ciclops
21 7 Ciclops
22 7 Ciclops
23 7 Ciclops # step = 5 and ID_CLI Different next ID_CLI from step 6
**SpreadSheet_4.xlsx**
POSITION ID_CLI NAME_CLI
24 8 Nocturn # End Records
CodePudding user response:
This should work.
"""
- creates chunk for size of 5
- if duplicate data is in 5th place and 6th place, it will increase chunk size until the final duplicate data place is found and creates its chunk.
"""
step = 5
a = 0
df = pd.read_excel('sample.xlsx')
df['POSITION'] = df['POSITION'].astype(int)
df['ID_CLI'] = df['ID_CLI'].astype(int)
df['POS_INDEX'] = df['POSITION']
df = df.set_index('POS_INDEX')
while True:
if step >= len(df):
a = 1
df.loc[1:step].to_excel(f'SpreadSheet_{a}.xlsx', index=False)
break
if df.loc[step,'NAME_CLI'] == df.loc[step 1,'NAME_CLI']:
while df.loc[step,'NAME_CLI'] == df.loc[step 1,'NAME_CLI']:
step = 1
a = 1
df.loc[1:step].to_excel(f'SpreadSheet_{a}.xlsx', index=False)
df.drop(list(range(1,step 1)), axis=0,inplace=True) #droping chunk to create new df
df.index = list(range(1,len(df) 1))
df.index.name = 'POS_INDEX'
step=5
else:
step = 5
a = 1
df.loc[1:step].to_excel(f'SpreadSheet_{a}.xlsx', index=False)
df.drop(list(range(1,step 1)), axis=0,inplace=True)
df.index = list(range(1,len(df) 1))
df.index.name = 'POS_INDEX'