How to randomly change 50% of a column observations based on another column condition?-CodePudding

I'm analyzing a survey and I need to do an interaction.plot() between variable disclosure_1 and TYPE_1 to see how they affect a third variable ADTRUST.

The survey randomly showed a different scenario to each participant. DISCLOSURE_1 is the code to indicate the type of disclosure that was shown to a respondent (B = Before, D = During, A = After, N = None). TYPE_1 indicates the terminology used (DF = Deepfake, SM = Sythetic Media).

When creating the survey I dumbly only used DF for N because I thought there was no need to create an SM scenario if there was no disclosure shown (so no difference in terminology used). It still makes no sense logically, but when plotting the interaction plot the variable N does not appear. And since the study wants to analyze:

how disclosure impacts ADTRUST
how disclosure positioning impacts ADTRUST
how different terminology used in disclosure impact ADTRUST

I need to randomly substitute 50% of the observations with SM instead of DF under TYPE_1 but ONLY if the column DISCLOSURE_1 is == N.

I have no clue how to do that. Could somebody please help?

NOTE!!!! The structure is part of a bigger dataset. the dput was only done only for [25:26], so keep in mind I need to be able to precisely select the columns in the code.

Thank youu

structure(list(DISCLOSURE_1 = structure(c(4L, 3L, 1L, 3L, 1L, 
1L, 4L, 3L, 4L, 3L, 4L, 1L, 3L, 3L, 4L, 1L, 3L, 3L, 1L, 3L, 3L, 
4L, 2L, 1L, 1L, 4L, 2L, 3L, 3L, 1L, 4L, 1L, 1L, 4L, 4L, 1L, 3L, 
2L, 2L, 1L, 4L, 1L, 1L, 1L, 4L, 3L, 4L, 3L, 2L, 3L, 1L, 1L, 3L, 
1L, 1L, 2L, 1L, 2L, 2L, 2L, 3L, 2L, 1L, 3L, 3L, 3L, 3L, 2L, 2L, 
3L, 3L, 2L, 1L, 2L, 3L, 1L, 2L, 2L, 1L, 1L, 1L, 3L, 2L, 2L, 3L, 
3L, 2L, 3L, 3L, 2L, 1L, 3L, 1L, 1L, 4L, 4L, 1L, 2L, 4L, 3L, 1L, 
1L, 1L, 3L, 2L, 3L, 1L, 2L, 2L, 3L, 4L, 2L, 4L, 2L, 3L, 3L, 2L, 
4L, 4L, 3L, 4L, 1L, 1L, 3L, 3L, 1L, 3L, 2L, 3L, 3L, 1L, 2L, 1L, 
4L, 1L, 2L, 3L, 3L, 1L, 4L, 2L, 3L, 2L, 1L, 1L, 2L, 2L, 1L, 4L, 
2L, 4L, 1L, 4L, 2L, 1L, 1L, 1L, 3L, 2L, 3L, 2L, 4L, 3L, 1L, 3L, 
1L, 1L, 3L, 2L, 3L, 2L, 4L, 3L, 3L, 1L, 1L, 3L, 2L, 2L, 1L, 1L, 
3L, 3L, 4L, 2L, 2L, 3L, 3L, 1L, 2L, 1L, 2L, 3L, 2L, 3L, 3L, 3L, 
3L, 4L, 3L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 4L, 1L, 2L, 2L, 3L, 3L, 
1L, 4L, 4L, 1L, 2L, 4L, 2L, 1L, 1L, 4L, 1L, 1L, 2L, 1L, 2L, 2L, 
3L, 3L, 3L, 3L, 2L, 1L, 4L, 1L, 1L, 2L, 2L, 4L, 2L, 3L, 1L, 2L, 
3L, 3L, 2L, 4L, 3L, 2L, 2L, 4L, 4L, 2L, 1L, 1L, 2L, 2L, 3L, 1L, 
1L, 4L, 3L, 1L, 3L, 3L, 3L, 2L, 2L, 2L, 4L, 2L, 4L, 4L, 4L, 1L, 
3L, 1L, 3L, 1L, 3L, 1L, 2L, 4L, 3L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 
2L, 1L, 2L, 4L, 1L, 2L, 3L, 2L, 1L, 2L, 4L, 4L, 2L, 3L, 2L, 1L, 
2L, 2L, 2L, 4L, 2L, 1L, 3L, 1L, 3L, 3L, 4L, 1L, 2L, 3L, 2L, 2L, 
3L, 4L, 3L, 2L, 3L, 3L, 2L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 2L, 2L, 
1L, 2L, 2L, 1L, 1L, 3L, 3L, 1L, 2L, 1L, 1L, 3L, 2L, 1L, 2L, 3L, 
2L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 1L, 3L), .Label = c("A", "B", 
"D", "N"), class = "factor"), TYPE_1 = structure(c(1L, 1L, 2L, 
1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 
2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 
2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 
1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 
2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 
1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 
2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 
2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 
2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 
1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 
2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 
1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 
2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L), .Label = c("DF", 
"SM"), class = "factor")), row.names = c(NA, -367L), class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response：

With data as your data.frame, this will replace exactly half (rounded down) of the N's with DF with SM:

blnN <- data$DISCLOSURE_1 == "N" & data$TYPE_1 == "DF"
data$TYPE_1[sample(which(blnN), sum(blnN)/2)] <- "SM"

CodePudding user response：

If the 50% requirement is approximate, you can use runif() > 0.5

library(dplyr)
table(df)

            TYPE_1
DISCLOSURE_1 DF SM
           A 52 51
           B 52 53
           D 55 51
           N 53  0

mut <- df |>
  mutate(TYPE_1 = ifelse(DISCLOSURE_1 == "N" &
                         TYPE_1 == "DF" &
                         runif(n()) > 0.5,
                         "SM",
                         as.character(TYPE_1)))

table(mut)

            TYPE_1
DISCLOSURE_1 DF SM
           A 52 51
           B 52 53
           D 55 51
           N 27 26