I have a pandas dataframe with a column ID
as follows.
ID
1
1
1
20
20
30
50
50
51
60
60
300
300
302
302
500
Is there any automatic efficient way I can split this into n
,lets say n=4
chunks such that
chunk-1:
ID
1
1
1
20
20
chunk-2:
ID
30
50
50
51
chunk-3:
ID
60
60
300
300
chunk-4:
ID
302
302
500
It should be noted that
- size of each chunk may not be same.
- There is no common values between any of the chunks.
I tried with simple df_split = np.array_split(df, 4)
but it does not fulfil condition 2 in the above 2 conditions.
CodePudding user response:
Use groupby
to split data.
import pandas as pd
df = pd.DataFrame({'id': [1,1,1,20,20,30,40,40,51,60,60,300,300,302,302,500]})
df_grouped = [subgroup for _, subgroup in df.groupby('id')]
If you want to gather this result as four groups,
df_grouped_new = [df_grouped[0], df_grouped[1], df_grouped[2], pd.concat(df_grouped[3:])]
Then
print(df_grouped_new[0])
>>>
id
0 1
1 1
2 1
print(df_grouped_new[1])
>>>
id
3 20
4 20
print(df_grouped_new[2])
>>>
id
5 30
print(df_grouped_new[3])
>>>
id
6 40
7 40
8 51
9 60
10 60
11 300
12 300
13 302
14 302
15 500