Divide data into train, validation, and test based on group ID and index-CodePudding

I have this dataset. I want to split the data into training, validation, and testing as (60 ,20 ,20 ) considering the group ID and the index.

Example: Group Id = 1 will have the first 60 % of the data in the training(indexes 0,1,2 ), and the second 20% in the validation (index 3) and the rest in testing (index 4) and so on for all group ids

pd.DataFrame({'Group_ID':[1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3],
               'Target': [1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,1,0,1,0,0,0,1,1]})

CodePudding user response：

Try with sample , then drop the index

train = df.groupby('Group_ID').sample(frac=0.6)
test = df.drop(train.index).groupby('Group_ID').sample(frac=0.5)#20% vs 20% 
vaild = df.drop(train.index).drop(test.index)