I have a dataframe like this:
start = 0
end = 2
v1<-round ( runif ( n=20, min=start, max=end ))
v2<-round ( runif ( n=20, min=start, max=end ))
v3<-round ( runif ( n=20, min=start, max=end ))
df <- data.frame(v1,v2,v3)
I want to 20 percent of each numbers of each column changed to 5. For example, if I have 10 number of "1" in the first column, I want to 20 percent of this number "1" converted to "5".
How can I solve it in R programme?
Thanks for answering!
CodePudding user response:
I created a simple function that you can use inside an apply
(or purrr::map
or purrr::map_df
function)
# reproducibility
set.seed(4242)
start = 0
end = 2
v1 <-round(runif(n = 20, min = start, max = end))
v2 <-round(runif(n = 20, min = start, max = end))
v3 <-round(runif(n = 20, min = start, max = end))
df1 <- data.frame(v1,v2,v3)
# replace the value with 5 for the chosen number in the vectors
change_number <- function(x, number) {
x <- replace(x, sample(which(x == number),
length(which(x == number)) * 0.2,
replace = FALSE), 5)
x
}
# reproducibility
set.seed(42)
df2 <- apply(df1, 2, change_number, number = 1)
# if you need it returned as a data.frame instead of a matrix run the following line
# df2 <- data.frame(df2)
df2
v1 v2 v3
[1,] 2 0 1
[2,] 5 5 0
[3,] 0 0 0
[4,] 1 1 1
[5,] 1 1 1
[6,] 2 1 1
[7,] 2 2 1
[8,] 0 5 1
[9,] 1 1 1
[10,] 2 1 0
[11,] 0 1 2
[12,] 1 0 2
[13,] 0 2 0
[14,] 2 1 1
[15,] 2 0 2
[16,] 2 0 5
[17,] 0 2 0
[18,] 1 1 5
[19,] 1 1 2
[20,] 0 1 1
CodePudding user response:
It sounds as if you want to have 20% of every number in each column replaced with 5. In this case, try this function replaceNumbers
. Argument repl=
defines the replacement, in your case 5
. If you set at_least_one=
to TRUE
, at least one of of those numbers where 20% of their occurrences is smaller than one also get replaced.
replaceNumbers <- function(x, repl, at_least_one=FALSE) {
sapply(unique(x), function(z) {
w <- which(x == z)
l <- length(w)
if (l < 3 & at_least_one) s <- 1
else s <- l*.2
x[sample(w, s)] <<- repl
})
return(x)
}
set.seed(42) ## for sake of reproducibility
res <- as.data.frame(lapply(df, replaceNumbers, repl=5, at_least_one=FALSE))
Comparing results to original:
data.frame(df, res)
# v1 v2 v3 v1.1 v2.1 v3.1
# 1 2 2 1 5 5 1
# 2 2 0 1 2 0 5
# 3 1 2 0 5 2 0
# 4 2 2 2 2 2 5
# 5 1 0 1 1 5 1
# 6 1 1 2 1 1 2
# 7 1 1 2 1 1 2
# 8 0 2 1 0 2 1
# 9 1 1 2 5 1 2
# 10 1 2 1 1 2 1
# 11 1 1 1 1 5 1
# 12 1 2 1 1 2 1
# 13 2 1 1 2 1 1
# 14 1 1 2 1 1 2
# 15 1 0 0 1 0 0
# 16 2 2 1 2 2 1
# 17 2 0 1 2 0 5
# 18 0 0 0 0 0 0
# 19 1 2 1 1 2 1
# 20 1 1 1 1 1 1
Data:
df <- structure(list(v1 = c(2, 2, 1, 2, 1, 1, 1, 0, 1, 1, 1, 1, 2,
1, 1, 2, 2, 0, 1, 1), v2 = c(2, 0, 2, 2, 0, 1, 1, 2, 1, 2, 1,
2, 1, 1, 0, 2, 0, 0, 2, 1), v3 = c(1, 1, 0, 2, 1, 2, 2, 1, 2,
1, 1, 1, 1, 2, 0, 1, 1, 0, 1, 1)), class = "data.frame", row.names = c(NA,
-20L))
CodePudding user response:
So you want 20% of the "1" in a data.frame to become 5, correct?
First let's ask the positions of every number 1 using which()
. Then take 20% of then and finally assign the value of "5" to those positions.
df1s = which(df==1)
df1s = sample(df1s,length(df1s)*0.2,replace=F)
df[df1s] = 5