I have a dataframe as follows. Original dataframe has more than 100 rows.
df <- data.frame(a=rnorm(10, 0,1), b=rnorm(10,1,2), c=rnorm(10, 2, 1), d=rnorm(10, 1,2))
a b c d
1 -0.56047565 3.44816359 0.9321763 1.8529284
2 -0.23017749 1.71962765 1.7820251 0.4098570
3 1.55870831 1.80154290 0.9739956 2.7902513
4 0.07050839 1.22136543 1.2711088 2.7562670
5 0.12928774 -0.11168227 1.3749607 2.6431622
6 1.71506499 4.57382627 0.3133067 2.3772805
7 0.46091621 1.99570096 2.8377870 2.1078353
8 -1.26506123 -2.93323431 2.1533731 0.8761766
9 -0.68685285 2.40271180 0.8618631 0.3880747
10 -0.44566197 0.05441718 3.2538149 0.2390580
I want to shuffle column b and column d 1000 times to get 1000 dataframes. In each data frame, value of each cell of column b should retain its corresponding value in column d as in the original dataframe. By the word'shuffle', I intend to say 'the order of columns has to be randomized' but the relative order of columns b and d should be maintained. To be clear, I want rows values to be re-ordered but with respect to columns b and d.
CodePudding user response:
If we want to reorder the 'b', 'd' columns randomly, sample
the sequence of rows and use that to shuffle the subset of the datase. This can be replicate
d n
times
f1 <- function(dat)
{
i1 <- sample(seq_len(nrow(dat)))
dat[c('b', 'd')] <- dat[i1, c('b', 'd')]
dat
}
-testing
n <- 1000
lst1Out <- replicate(n, f1(df), simplify = FALSE)
-output
> lst1Out[1:2]
[[1]]
a b c d
1 1.6821761 2.42533261 1.6757297 -0.5016380
2 -0.6357365 -0.01191492 2.0601604 -2.8287189
3 -0.4616447 3.68607765 1.4111055 3.3531666
4 1.4322822 0.57084118 2.5314962 -2.3299449
5 -0.6506964 0.64533904 0.4816059 0.8862064
6 -0.2073807 0.92473166 2.3065579 1.0347912
7 -0.3928079 0.85287119 0.4635502 5.1743331
8 -0.3199929 0.64088694 1.6990239 0.0729392
9 -0.2791133 0.79961852 1.4717201 -1.2318402
10 0.4941883 -0.36332096 1.3479052 -1.5726011
[[2]]
a b c d
1 1.6821761 -0.01191492 1.6757297 -2.8287189
2 -0.6357365 0.92473166 2.0601604 1.0347912
3 -0.4616447 -0.36332096 1.4111055 -1.5726011
4 1.4322822 0.57084118 2.5314962 -2.3299449
5 -0.6506964 0.64533904 0.4816059 0.8862064
6 -0.2073807 0.85287119 2.3065579 5.1743331
7 -0.3928079 0.64088694 0.4635502 0.0729392
8 -0.3199929 3.68607765 1.6990239 3.3531666
9 -0.2791133 2.42533261 1.4717201 -0.5016380
10 0.4941883 0.79961852 1.3479052 -1.2318402
-original data
> df
a b c d
1 1.6821761 0.64533904 1.6757297 0.8862064
2 -0.6357365 -0.01191492 2.0601604 -2.8287189
3 -0.4616447 3.68607765 1.4111055 3.3531666
4 1.4322822 0.57084118 2.5314962 -2.3299449
5 -0.6506964 0.64088694 0.4816059 0.0729392
6 -0.2073807 0.79961852 2.3065579 -1.2318402
7 -0.3928079 2.42533261 0.4635502 -0.5016380
8 -0.3199929 0.85287119 1.6990239 5.1743331
9 -0.2791133 0.92473166 1.4717201 1.0347912
10 0.4941883 -0.36332096 1.3479052 -1.5726011