Home > Net >  How to completely shuffle column in R
How to completely shuffle column in R

Time:10-02

I want to randomly shuffle a column of numbers in R and append as new column for three times. But I want each element to be shuffled to a new row each time. Say if 1 goes to r3 for c2, it cannot got to either r1 or r3 for c3

sample <- data.frame(1:4)

basically producing something like this, but I want to use for larger data,
c1 is the original

||||||c1| c2 |c3 |c4\
---------------\
r1 | 1  | 4  | 3  | 2\
r2 | 2  | 1  | 4  | 3\
r3 | 3  | 2  | 1  | 4\
r4 | 4  | 3  | 2  | 1

CodePudding user response:

Interesting question. Here's an inelegant solution that solves for the numbers 1 to 4, but works for any initial set of values.

It builds each column in turn. At each step, it first randomly shuffles numbers in the first column. Then it makes sure that there are no row-wise matches, so that as you say if 1 goes to r3 for c2, it cannot got to either r1 or r3 for c3.

library(dplyr)
initial_values <- 1:4

cs <- tibble(c1 = initial_values,
             c2 = 1,
             c3 = 1,
             c4 = 1)

while(any(cs$c1 == cs$c2)){
  cs <- cs %>%
    mutate(c2 = sample(c1, length(c1)))
}

while(any(cs$c3 == cs$c2 | cs$c3 == cs$c1)){
  cs <- cs %>%
    mutate(c3 = sample(c1, length(c1)))
}

while(any(cs$c4 == cs$c3 | cs$c4 == cs$c2 | cs$c4 == cs$c1)){
  cs <- cs %>%
    mutate(c4 = sample(c1, length(c1)))
}

And here are the results:

# A tibble: 4 x 4
     c1    c2    c3    c4
  <int> <int> <int> <int>
1     1     4     2     3
2     2     3     1     4
3     3     2     4     1
4     4     1     3     2

With a bit of thought I'm sure you could extend this to a general function that works for any number of columns.

CodePudding user response:

The best method to use would depend on how many rows and columns you have (as well as whether or not duplicate values are present in c1).

Assuming there are many more rows than columns (iterations) needed, a reasonable approach could be to randomly generate permutations, throwing out anything that produces a repetition:

N = 100000
ncols = 3
sample = data.frame(c1=1:N)

orderings = data.frame(c1 = 1:N) # Initial ordering
reordering = orderings[,1]
c = 1
no_generated = 0
while (c <= ncols){
  while (sum(reordering == orderings) > 0){ # check for any repetitions
    print(sum(reordering == orderings))
    reordering = order(runif(N)) # random reordering
    no_generated = no_generated   1
  }
  c = c   1
  orderings[[paste0('c',c)]] = reordering
}
cat(sprintf('%d permutations generated\n', no_generated))
print(sum(duplicated(orderings,MARGIN=2))) # Should be zero

If the number of rows is closer to the number of columns needed, a smarter combinatorial method is probably better.

CodePudding user response:

I was able to figure it out with a while loop. I know the loop condition is kind of long, but in my case, I know that I only need 3 new columns, so its wans't too bad. I guess if number of column is unkown, I would need more modifications

data   = c(1:5)
data_1 = c(1:5)
data_2 = c(1:5)
data_3 = c(1:5)

while(any(data == data_1|data == data_2|data == data_3|data_1 == data_2|data_1 == data_3|data_2 == data_3)){
  data_1 = c(sample(data,5, replace = FALSE))
  data_2 = c(sample(data,5, replace = FALSE))
  data_3 = c(sample(data,5, replace = FALSE))
}

df = data.frame(data,data_1,data_2,data_3)
df
  • Related