How can I shuffle two columns of a data.frame simultaneously?-CodePudding

I have a dataframe as follows. Original dataframe has more than 100 rows.

df <- data.frame(a=rnorm(10, 0,1), b=rnorm(10,1,2), c=rnorm(10, 2, 1), d=rnorm(10, 1,2))

        a           b         c          d
1  -0.56047565  3.44816359 0.9321763 1.8529284
2  -0.23017749  1.71962765 1.7820251 0.4098570
3   1.55870831  1.80154290 0.9739956 2.7902513
4   0.07050839  1.22136543 1.2711088 2.7562670
5   0.12928774 -0.11168227 1.3749607 2.6431622
6   1.71506499  4.57382627 0.3133067 2.3772805
7   0.46091621  1.99570096 2.8377870 2.1078353
8  -1.26506123 -2.93323431 2.1533731 0.8761766
9  -0.68685285  2.40271180 0.8618631 0.3880747
10 -0.44566197  0.05441718 3.2538149 0.2390580



I want to shuffle column b and column d 1000 times to get 1000 dataframes. In each data frame, value of each cell of column b should retain its corresponding value in column d as in the original dataframe. By the word'shuffle', I intend to say 'the order of columns has to be randomized' but the relative order of columns b and d should be maintained. To be clear, I want rows values to be re-ordered but with respect to columns b and d.

CodePudding user response：

If we want to reorder the 'b', 'd' columns randomly, sample the sequence of rows and use that to shuffle the subset of the datase. This can be replicated n times

f1 <- function(dat) 
   {
   i1 <- sample(seq_len(nrow(dat)))
   dat[c('b', 'd')] <- dat[i1, c('b', 'd')]
   dat
   
   }

-testing

n <- 1000
lst1Out <- replicate(n, f1(df), simplify = FALSE)

-output

> lst1Out[1:2]
[[1]]
            a           b         c          d
1   1.6821761  2.42533261 1.6757297 -0.5016380
2  -0.6357365 -0.01191492 2.0601604 -2.8287189
3  -0.4616447  3.68607765 1.4111055  3.3531666
4   1.4322822  0.57084118 2.5314962 -2.3299449
5  -0.6506964  0.64533904 0.4816059  0.8862064
6  -0.2073807  0.92473166 2.3065579  1.0347912
7  -0.3928079  0.85287119 0.4635502  5.1743331
8  -0.3199929  0.64088694 1.6990239  0.0729392
9  -0.2791133  0.79961852 1.4717201 -1.2318402
10  0.4941883 -0.36332096 1.3479052 -1.5726011

[[2]]
            a           b         c          d
1   1.6821761 -0.01191492 1.6757297 -2.8287189
2  -0.6357365  0.92473166 2.0601604  1.0347912
3  -0.4616447 -0.36332096 1.4111055 -1.5726011
4   1.4322822  0.57084118 2.5314962 -2.3299449
5  -0.6506964  0.64533904 0.4816059  0.8862064
6  -0.2073807  0.85287119 2.3065579  5.1743331
7  -0.3928079  0.64088694 0.4635502  0.0729392
8  -0.3199929  3.68607765 1.6990239  3.3531666
9  -0.2791133  2.42533261 1.4717201 -0.5016380
10  0.4941883  0.79961852 1.3479052 -1.2318402

-original data

> df
            a           b         c          d
1   1.6821761  0.64533904 1.6757297  0.8862064
2  -0.6357365 -0.01191492 2.0601604 -2.8287189
3  -0.4616447  3.68607765 1.4111055  3.3531666
4   1.4322822  0.57084118 2.5314962 -2.3299449
5  -0.6506964  0.64088694 0.4816059  0.0729392
6  -0.2073807  0.79961852 2.3065579 -1.2318402
7  -0.3928079  2.42533261 0.4635502 -0.5016380
8  -0.3199929  0.85287119 1.6990239  5.1743331
9  -0.2791133  0.92473166 1.4717201  1.0347912
10  0.4941883 -0.36332096 1.3479052 -1.5726011