Home > Software engineering >  Randomly delete rows due to a variable in a dataframe
Randomly delete rows due to a variable in a dataframe

Time:10-22

In my data below, I wonder how to delete all rows with a given value of outcome (say "A") from n (say 1) randomly selected studyies?

The only condition is that we want to select only from studies that have used more than one value of outcome (e.g., study==1 and study==2 each of which has both outcome == "A" and outcome == "B").

For example, below let's say the given value of outcome is "A". Then, for a given n (say n = 1), we delete all rows with with outcome == "A" from n = 1 randomly selected study from study==1 or study==2.

Is this possible in R?

m =
  "
  study group outcome 
1     1     1       A   
2     1     1       B    
3     1     2       A 
4     1     2       B 
5     2     1       A   
6     2     1       B    
7     2     2       A 
8     2     2       B
9     3     1       B
10     4     1       B  
"  
data <- read.table(text=m,h=T)

CodePudding user response:

library(dplyr)

n = 1
studies_to_remove = sample(unique(data$study), size = n)
outcome_to_remove = "A"

data %>%
  filter(
    !(
      study %in% studies_to_remove &
        outcome %in% outcome_to_remove
    )
  )
 
#    study group outcome
# 2      1     1       B
# 4      1     2       B
# 5      2     1       A
# 6      2     1       B
# 7      2     2       A
# 8      2     2       B
# 9      3     1       B
# 10     4     1       B
  • Related