Home > Blockchain >  How can I shuffle two columns of a data.frame simultaneously?
How can I shuffle two columns of a data.frame simultaneously?

Time:11-01

I have a dataframe as follows. Original dataframe has more than 100 rows.


df <- data.frame(a=rnorm(10, 0,1), b=rnorm(10,1,2), c=rnorm(10, 2, 1), d=rnorm(10, 1,2))

        a           b         c          d
1  -0.56047565  3.44816359 0.9321763 1.8529284
2  -0.23017749  1.71962765 1.7820251 0.4098570
3   1.55870831  1.80154290 0.9739956 2.7902513
4   0.07050839  1.22136543 1.2711088 2.7562670
5   0.12928774 -0.11168227 1.3749607 2.6431622
6   1.71506499  4.57382627 0.3133067 2.3772805
7   0.46091621  1.99570096 2.8377870 2.1078353
8  -1.26506123 -2.93323431 2.1533731 0.8761766
9  -0.68685285  2.40271180 0.8618631 0.3880747
10 -0.44566197  0.05441718 3.2538149 0.2390580



I want to shuffle column b and column d 1000 times to get 1000 dataframes. In each data frame, value of each cell of column b should retain its corresponding value in column d as in the original dataframe. By the word'shuffle', I intend to say 'the order of columns has to be randomized' but the relative order of columns b and d should be maintained. To be clear, I want rows values to be re-ordered but with respect to columns b and d.

CodePudding user response:

If we want to reorder the 'b', 'd' columns randomly, sample the sequence of rows and use that to shuffle the subset of the datase. This can be replicated n times

f1 <- function(dat) 
   {
   i1 <- sample(seq_len(nrow(dat)))
   dat[c('b', 'd')] <- dat[i1, c('b', 'd')]
   dat
   
   }

-testing

n <- 1000
lst1Out <- replicate(n, f1(df), simplify = FALSE)

-output

> lst1Out[1:2]
[[1]]
            a           b         c          d
1   1.6821761  2.42533261 1.6757297 -0.5016380
2  -0.6357365 -0.01191492 2.0601604 -2.8287189
3  -0.4616447  3.68607765 1.4111055  3.3531666
4   1.4322822  0.57084118 2.5314962 -2.3299449
5  -0.6506964  0.64533904 0.4816059  0.8862064
6  -0.2073807  0.92473166 2.3065579  1.0347912
7  -0.3928079  0.85287119 0.4635502  5.1743331
8  -0.3199929  0.64088694 1.6990239  0.0729392
9  -0.2791133  0.79961852 1.4717201 -1.2318402
10  0.4941883 -0.36332096 1.3479052 -1.5726011

[[2]]
            a           b         c          d
1   1.6821761 -0.01191492 1.6757297 -2.8287189
2  -0.6357365  0.92473166 2.0601604  1.0347912
3  -0.4616447 -0.36332096 1.4111055 -1.5726011
4   1.4322822  0.57084118 2.5314962 -2.3299449
5  -0.6506964  0.64533904 0.4816059  0.8862064
6  -0.2073807  0.85287119 2.3065579  5.1743331
7  -0.3928079  0.64088694 0.4635502  0.0729392
8  -0.3199929  3.68607765 1.6990239  3.3531666
9  -0.2791133  2.42533261 1.4717201 -0.5016380
10  0.4941883  0.79961852 1.3479052 -1.2318402

-original data

> df
            a           b         c          d
1   1.6821761  0.64533904 1.6757297  0.8862064
2  -0.6357365 -0.01191492 2.0601604 -2.8287189
3  -0.4616447  3.68607765 1.4111055  3.3531666
4   1.4322822  0.57084118 2.5314962 -2.3299449
5  -0.6506964  0.64088694 0.4816059  0.0729392
6  -0.2073807  0.79961852 2.3065579 -1.2318402
7  -0.3928079  2.42533261 0.4635502 -0.5016380
8  -0.3199929  0.85287119 1.6990239  5.1743331
9  -0.2791133  0.92473166 1.4717201  1.0347912
10  0.4941883 -0.36332096 1.3479052 -1.5726011
  • Related