Home > Blockchain >  Applying random operations to groups of columns
Applying random operations to groups of columns

Time:06-05

Suppose I have the following dataset:

id = 1:100
var_1 = rnorm(100,100,100)
var_2 = rnorm(100,100,100)
var_3 = rnorm(100,100,100)
var_4 = rnorm(100,100,100)
my_data = data.frame(id, var_1, var_2, var_3, var_4)
  • I want to randomly replace 25 of the SAME entries in var_1 and var_2 with NA
  • I want to randomly replace 20 of the SAME entries in var_3 and var_4 with NA.

I found the following code that can replace entries in a column with NA:

my_data$var_1[sample(nrow(my_data),25)]<-NA
my_data$var_2[sample(nrow(my_data),25)]<-NA
my_data$var_3[sample(nrow(my_data),20)]<-NA
my_data$var_4[sample(nrow(my_data),20)]<-NA

But is there someway to ensure that the same entries in var_1 and var_2 are replaced with NA, and the same entries in var_3 and var_4 are replace with NA?

Reference:

CodePudding user response:

Like this?

library(data.table)
setDT(my_data)
my_data[sample(.N, 25), c('var_1', 'var_2'):=NA]
my_data[sample(.N, 25), c('var_3', 'var_4'):=NA]
my_data[1:10]
##     id     var_1      var_2      var_3      var_4
##  1:  1  37.35462  37.963332         NA         NA
##  2:  2 118.36433 104.211587  268.88733  -4.729815
##  3:  3  16.43714   8.907835  258.65884 297.133739
##  4:  4 259.52808 115.802877   66.90922  61.636789
##  5:  5        NA         NA -128.52355 265.414530
##  6:  6  17.95316 276.728727  349.76616 251.221269
##  7:  7 148.74291 171.670748         NA         NA
##  8:  8 173.83247 191.017423  154.13273 156.722091
##  9:  9 157.57814 138.418536   98.66005  -2.454848
## 10: 10  69.46116 268.217608  151.01084 132.300650

CodePudding user response:

You may use Map, especially helpful if the number of columns you want to replace grow.

set.seed(42)
my_data[-1] <- Map(\(x, y) {my_data[sample.int(nrow(my_data), y), x] <- NA; my_data[x]}, 
    list(c('var_1', 'var_2'), c('var_3', 'var_4')), c(25, 20)) |> as.data.frame()

head(my_data)
#   id     var_1    var_2      var_3    var_4
# 1  1 237.09584 220.0965 -100.09292 99.53792
# 2  2  43.53018 204.4751         NA       NA
# 3  3        NA       NA         NA       NA
# 4  4 163.28626 284.8482         NA       NA
# 5  5        NA       NA  -37.68616 85.35274
# 6  6  89.38755 110.5514         NA       NA
  • Related