How to select n random values from each rows of a dataframe in R?-CodePudding

I have a dataframe

df= data.frame(a=c(56,23,15,10),
              b=c(43,NA,90.7,30.5),
              c=c(12,7,10,2),
              d=c(1,2,3,4),
              e=c(NA,45,2,NA))

I want to select two random non-NA row values from each row and convert the rest to NA

Required Output- will differ because of randomness

df= data.frame(
              a=c(56,NA,15,NA),
              b=c(43,NA,NA,NA),
              c=c(NA,7,NA,2),
              d=c(NA,NA,3,4),
              e=c(NA,45,NA,NA))

Code Used
I know to select random non-NA value from specific rows

set.seed(2)
sample(which(!is.na(df[1,])),2)

But no idea how to apply it all dataframe and get the required output

CodePudding user response：

You may write a function to keep n random values in a row.

keep_n_value <- function(x, n) {
  x1 <- which(!is.na(x))
  x[-sample(x1, n)] <- NA
  x
}

Apply the function by row using base R -

set.seed(123)
df[] <- t(apply(df, 1, keep_n_value, 2))
df
#   a    b  c  d  e
#1 NA   NA 12  1 NA
#2 NA   NA  7  2 NA
#3 NA 90.7 10 NA NA
#4 NA 30.5 NA  4 NA

Or if you prefer tidyverse -

purrr::pmap_df(df, ~keep_n_value(c(...),  2))

CodePudding user response：

Base R:

You could try column wise apply (sapply) and randomly replace two non-NA values to be NA, like:

as.data.frame(sapply(df, function(x) replace(x, sample(which(!is.na(x)), 2), NA)))

Example Output:

   a    b  c  d  e
1 56   NA 12 NA NA
2 23   NA NA  2 NA
3 NA   NA 10  3 NA
4 NA 30.5 NA NA NA

CodePudding user response：

One option using dplyr and purrr could be:

df %>%
    mutate(pmap_dfr(across(everything()), ~ `[<-`(c(...), !seq_along(c(...)) %in% sample(which(!is.na(c(...))), 2), NA)))

   a    b  c  d  e
1 56 43.0 NA NA NA
2 23   NA  7 NA NA
3 15   NA NA NA  2
4 NA 30.5  2 NA NA