Home > Software design >  Adding a random value to each value in the data, conditional on the column means
Adding a random value to each value in the data, conditional on the column means

Time:09-19

I have example data as follows:

library(data.table)
sample <- fread("
A,0,L,2,2
B,4,K,3,0
")
names(sample) <- c("type", "value1", "cat", "value2", "value3")

I would like to add a random value, to every numerical value in the dataset. I would like that random value to have a mean of 0, and a standard deviation of 0.1 * each column mean.

I made a start as follows, but I am getting a little bit confused as how to store the values and how to use the apply function correctly here.

fetch_cols <- which(sapply(dat, is.numeric))
lapply(sample[, fetch_cols], rnorm(1, 0, 0.1*mean(sample[, fetch_cols])))

Desired output would be something like:

library(data.table)
sample <- fread("
A,-0.2,L,1.9,2.15
B,4.2,K,3.1,-.05
")

CodePudding user response:

With dplyr you can use across:

library(dplyr)
sample %>%
  mutate(across(where(is.numeric), \(x) x   rnorm(n(), mean = 0, sd = 0.1 * mean(x))))

Or with data.table:

library(data.table)
num_cols <- names(sample)[sapply(sample, is.numeric)]
sample[, 
  (num_cols) := lapply(.SD, \(x) x   rnorm(.N, mean = 0, sd = 0.1 * mean(x))),
  .SDcols = num_cols
]

CodePudding user response:

You could use lapply to conditionally add the values you want:

sample[] <- lapply(sample, function(x) {
  if(!is.numeric(x)) x else x   rnorm(length(x), 0, 0.1 * mean(x))
  })
  
#>    type   value1 cat   value2    value3
#> 1:    A 0.147823   L 2.324099 1.8397374
#> 2:    B 4.077322   K 2.799110 0.0933251
  •  Tags:  
  • r
  • Related