split pandas with no common values between splits-CodePudding

I have a pandas dataframe with a column ID as follows.

Is there any automatic efficient way I can split this into n,lets say n=4 chunks such that

chunk-1:
ID
1
1
1
20
20

chunk-2:
ID
30
50
50
51

chunk-3:
ID
60
60
300
300

chunk-4:
ID
302
302
500

It should be noted that

size of each chunk may not be same.
There is no common values between any of the chunks.

I tried with simple df_split = np.array_split(df, 4) but it does not fulfil condition 2 in the above 2 conditions.

CodePudding user response：

Use groupby to split data.

import pandas as pd

df = pd.DataFrame({'id': [1,1,1,20,20,30,40,40,51,60,60,300,300,302,302,500]})
df_grouped = [subgroup for _, subgroup in df.groupby('id')]

If you want to gather this result as four groups,

df_grouped_new = [df_grouped[0], df_grouped[1], df_grouped[2], pd.concat(df_grouped[3:])]

Then

print(df_grouped_new[0])
>>> 
   id
0   1
1   1
2   1

print(df_grouped_new[1])
>>>
   id
3  20
4  20

print(df_grouped_new[2])
>>>
   id
5  30

print(df_grouped_new[3])
>>> 
     id
6    40
7    40
8    51
9    60
10   60
11  300
12  300
13  302
14  302
15  500