I have a data frame with dimension 24,523x3,468 and I want to shuffle the entries of this dataframe. For example, I have a simple data frame
df <- data.frame(c1=c(1, 1.5, 2, 4), c2=c(1.1, 1.6, 3, 3.2), c3=c(2.1, 2.4, 1.4, 1.7))
df_shuffled = transform(df, c2 = sample(c2))
It works for one column, but I want to shuffle all column, or all rows. I tried
col = colnames(df)
for (i in 1:ncol(df)){
df2 = transform(df, col[i] = sample(col[i]))
}
df2
It will produce an error like this
I have tried this too to shuffle, but it only shuffles rows and columns
df_shuf = df[sample(rownames(df), nrow(df)), sample(colnames(df), ncol(df))]
df_shuf
How can I shuffle the entries of the data frame df using a loop for I by rows and columns?
CodePudding user response:
One way to solve your problem:
df[] = lapply(df, sample)
CodePudding user response:
While I find the lapply(df, sample)
method the most straight-forward (and canonical), a literal fix to your for
loop is to recognize that transform
cannot use col[i]
on the LHS of an assignment. You can instead use df2[[ col[i] ]]
:
df2 <- df
col = colnames(df)
for (i in 1:ncol(df)) {
df2[[ col[i] ]] = sample(df2[[ col[i] ]])
}
We don't really need the names, though, you can indices instead:
df2 <- df
for (i in 1:ncol(df)) {
df2[[ i ]] = sample(df2[[ i ]])
}
This assumes, of course, that you intend to discard the correlation between values in a row. For instance, the minimum value for columns c1
and c2
occur together in row 1; after sampling, however, they may occur in different rows.
If your intent is to keep each row together, then we would just need to sample the rows, preserving the "observation" quality of the frame:
df2 <- df[sample(nrow(df)),]