I have this dataset. I want to split the data into training, validation, and testing as (60 ,20 ,20 ) considering the group ID and the index.
Example: Group Id = 1 will have the first 60 % of the data in the training(indexes 0,1,2 ), and the second 20% in the validation (index 3) and the rest in testing (index 4) and so on for all group ids
pd.DataFrame({'Group_ID':[1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3],
'Target': [1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,1,0,1,0,0,0,1,1]})
CodePudding user response:
Try with sample
, then drop
the index
train = df.groupby('Group_ID').sample(frac=0.6)
test = df.drop(train.index).groupby('Group_ID').sample(frac=0.5)#20% vs 20%
vaild = df.drop(train.index).drop(test.index)