In my data
below, I wonder how to delete all rows with a given value of outcome
(say "A"
) from n
(say 1
) randomly selected study
ies?
The only condition is that we want to select only from studies that have used more than one value of outcome
(e.g., study==1
and study==2
each of which has both outcome == "A"
and outcome == "B"
).
For example, below let's say the given value of outcome
is "A"
. Then, for a given n
(say n = 1
), we delete all rows with with outcome == "A"
from n = 1
randomly selected study
from study==1
or study==2
.
Is this possible in R
?
m =
"
study group outcome
1 1 1 A
2 1 1 B
3 1 2 A
4 1 2 B
5 2 1 A
6 2 1 B
7 2 2 A
8 2 2 B
9 3 1 B
10 4 1 B
"
data <- read.table(text=m,h=T)
CodePudding user response:
library(dplyr)
n = 1
studies_to_remove = sample(unique(data$study), size = n)
outcome_to_remove = "A"
data %>%
filter(
!(
study %in% studies_to_remove &
outcome %in% outcome_to_remove
)
)
# study group outcome
# 2 1 1 B
# 4 1 2 B
# 5 2 1 A
# 6 2 1 B
# 7 2 2 A
# 8 2 2 B
# 9 3 1 B
# 10 4 1 B