I have a data set that contains 2 columns columns A is winning teams and column B is losing team I would like to shuffle the values between the 2 columns so when i start training the model the model can't predict the winning ream simply by looking at the winning team column
Before
A B
1 2
2 1
3 4
4 3
After
A B
1 2
2 1
4 3
3 4
CodePudding user response:
Here is a way to shuffle each row randomly
(pd.DataFrame(df.apply(lambda x: np.random.choice(x,df.columns.size,replace = False),axis=1)
.tolist(),
columns = df.columns))
Output:
A B
0 1 2
1 1 2
2 3 4
3 4 3