Home > Enterprise >  Change 20 percent of specific value in each column to a number
Change 20 percent of specific value in each column to a number

Time:11-22

I have a dataframe like this:

start = 0
end = 2

v1<-round ( runif ( n=20, min=start, max=end ))
v2<-round ( runif ( n=20, min=start, max=end ))
v3<-round ( runif ( n=20, min=start, max=end ))

df <- data.frame(v1,v2,v3)

I want to 20 percent of each numbers of each column changed to 5. For example, if I have 10 number of "1" in the first column, I want to 20 percent of this number "1" converted to "5".

How can I solve it in R programme?

Thanks for answering!

CodePudding user response:

I created a simple function that you can use inside an apply (or purrr::map or purrr::map_df function)

# reproducibility
set.seed(4242)
start = 0
end = 2

v1 <-round(runif(n = 20, min = start, max = end))
v2 <-round(runif(n = 20, min = start, max = end))
v3 <-round(runif(n = 20, min = start, max = end))

df1 <- data.frame(v1,v2,v3)

# replace the value with 5 for the chosen number in the vectors
change_number <- function(x, number) {
  x <- replace(x, sample(which(x == number), 
                         length(which(x == number)) * 0.2, 
                         replace = FALSE), 5)
  x
  }

# reproducibility
set.seed(42) 

df2 <- apply(df1, 2, change_number, number = 1)
# if you need it returned as a data.frame instead of a matrix run the following line
# df2 <- data.frame(df2)
df2

     v1 v2 v3
 [1,]  2  0  1
 [2,]  5  5  0
 [3,]  0  0  0
 [4,]  1  1  1
 [5,]  1  1  1
 [6,]  2  1  1
 [7,]  2  2  1
 [8,]  0  5  1
 [9,]  1  1  1
[10,]  2  1  0
[11,]  0  1  2
[12,]  1  0  2
[13,]  0  2  0
[14,]  2  1  1
[15,]  2  0  2
[16,]  2  0  5
[17,]  0  2  0
[18,]  1  1  5
[19,]  1  1  2
[20,]  0  1  1

CodePudding user response:

It sounds as if you want to have 20% of every number in each column replaced with 5. In this case, try this function replaceNumbers. Argument repl= defines the replacement, in your case 5. If you set at_least_one= to TRUE, at least one of of those numbers where 20% of their occurrences is smaller than one also get replaced.

replaceNumbers <- function(x, repl, at_least_one=FALSE) {
  sapply(unique(x), function(z) {
    w <- which(x == z)
    l <- length(w)
    if (l < 3 & at_least_one) s <- 1
    else s <- l*.2
    x[sample(w, s)] <<- repl
  })
  return(x)
}

set.seed(42)  ## for sake of reproducibility
res <- as.data.frame(lapply(df, replaceNumbers, repl=5, at_least_one=FALSE))

Comparing results to original:

data.frame(df, res)
#    v1 v2 v3 v1.1 v2.1 v3.1
# 1   2  2  1    5    5    1
# 2   2  0  1    2    0    5
# 3   1  2  0    5    2    0
# 4   2  2  2    2    2    5
# 5   1  0  1    1    5    1
# 6   1  1  2    1    1    2
# 7   1  1  2    1    1    2
# 8   0  2  1    0    2    1
# 9   1  1  2    5    1    2
# 10  1  2  1    1    2    1
# 11  1  1  1    1    5    1
# 12  1  2  1    1    2    1
# 13  2  1  1    2    1    1
# 14  1  1  2    1    1    2
# 15  1  0  0    1    0    0
# 16  2  2  1    2    2    1
# 17  2  0  1    2    0    5
# 18  0  0  0    0    0    0
# 19  1  2  1    1    2    1
# 20  1  1  1    1    1    1

Data:

df <- structure(list(v1 = c(2, 2, 1, 2, 1, 1, 1, 0, 1, 1, 1, 1, 2, 
1, 1, 2, 2, 0, 1, 1), v2 = c(2, 0, 2, 2, 0, 1, 1, 2, 1, 2, 1, 
2, 1, 1, 0, 2, 0, 0, 2, 1), v3 = c(1, 1, 0, 2, 1, 2, 2, 1, 2, 
1, 1, 1, 1, 2, 0, 1, 1, 0, 1, 1)), class = "data.frame", row.names = c(NA, 
-20L))

CodePudding user response:

So you want 20% of the "1" in a data.frame to become 5, correct?

First let's ask the positions of every number 1 using which(). Then take 20% of then and finally assign the value of "5" to those positions.

df1s = which(df==1)
df1s = sample(df1s,length(df1s)*0.2,replace=F)
df[df1s] = 5
  •  Tags:  
  • r
  • Related