Home > Net >  split pandas with no common values between splits
split pandas with no common values between splits

Time:08-24

I have a pandas dataframe with a column ID as follows.

ID
1 
1
1
20
20
30
50
50
51
60
60
300
300
302
302
500

Is there any automatic efficient way I can split this into n,lets say n=4 chunks such that

chunk-1:
ID
1
1
1
20
20

chunk-2:
ID
30
50
50
51

chunk-3:
ID
60
60
300
300

chunk-4:
ID
302
302
500

It should be noted that

  1. size of each chunk may not be same.
  2. There is no common values between any of the chunks.

I tried with simple df_split = np.array_split(df, 4) but it does not fulfil condition 2 in the above 2 conditions.

CodePudding user response:

Use groupby to split data.

import pandas as pd

df = pd.DataFrame({'id': [1,1,1,20,20,30,40,40,51,60,60,300,300,302,302,500]})
df_grouped = [subgroup for _, subgroup in df.groupby('id')]

If you want to gather this result as four groups,

df_grouped_new = [df_grouped[0], df_grouped[1], df_grouped[2], pd.concat(df_grouped[3:])]

Then

print(df_grouped_new[0])
>>> 
   id
0   1
1   1
2   1
print(df_grouped_new[1])
>>>
   id
3  20
4  20
print(df_grouped_new[2])
>>>
   id
5  30
print(df_grouped_new[3])
>>> 
     id
6    40
7    40
8    51
9    60
10   60
11  300
12  300
13  302
14  302
15  500
  • Related