Creating a dataframe based on condition (with a probably dependent by age)?-CodePudding

Im trying to create a synthetic dataset, but im struggling a bit

Is there a way to create a column based on the values in another column?

between subject design and my participant are dividend in two conditions (condition 1 = 0 condition 2 = 1). I want to make a column "Trial_1" = 0 = Absence, 1 = Presence, but just for the participants in one of the conditions?

     df <- data.fram(
Id = seq(1, 10, by=1),
Age = sample(1:5, 10, replace = TRUE)
Condition = sample(0:1, 10, replace = TRUE)
Trial_1 = sample(0:1, 10, replace = TRUE, prob = c(0.3, 0.7)))
##BUT, want Trial_1 just do it for partisans' in in condition = 1

And if there is an easy way to make the probability based on age, that would be amazing!

Thanks in advance!

CodePudding user response：

You can create df with Id, Age, Condition columns first, and then use rowwise() and mutate() (both from dplyr package) to create Trial_1.

library(dplyr)

df %>%
  rowwise() %>% 
  mutate(Trial_1 = sample(0:1, 1, prob=c(1-Age/10, Condition*Age/10)))

Here, note that the probability of 0 and 1 is 1-Age/10 and Age/10, respectively, to make it age-dependent; you would want to change this to whatever dependence on age you would like.

Also, note that I multiply the probability corresponding to 1 by Condition, ensuring that Condition=0 rows always get 0.

Output:

      Id   Age Condition Trial_1
   <dbl> <int>     <int>   <int>
 1     1     1         0       0
 2     2     3         1       1
 3     3     1         0       0
 4     4     4         1       0
 5     5     3         1       0
 6     6     5         1       1
 7     7     4         1       0
 8     8     5         1       0
 9     9     3         0       0
10    10     2         1       0

If you prefer those rows to be NA, then do something like this instead:

df %>%
  rowwise() %>% 
  mutate(Trial_1 = if_else(Condition==1, sample(0:1, 1, prob=c(1-Age/10, Age/10)), NA_integer_))

Output:

      Id   Age Condition Trial_1
   <dbl> <int>     <int>   <int>
 1     1     1         0      NA
 2     2     3         1       1
 3     3     1         0      NA
 4     4     4         1       1
 5     5     3         1       0
 6     6     5         1       1
 7     7     4         1       0
 8     8     5         1       0
 9     9     3         0      NA
10    10     2         1       0

Input:

structure(list(Id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Age = c(1L, 
3L, 1L, 4L, 3L, 5L, 4L, 5L, 3L, 2L), Condition = c(0L, 1L, 0L, 
1L, 1L, 1L, 1L, 1L, 0L, 1L)), class = "data.frame", row.names = c(NA, 
-10L))

CodePudding user response：

I'd do it in two steps - first create the dataframe and then the Trial column. My solution isn't super elegant, but it's straightforward and doesn't require anything but base R. I hope it helps.

df <- data.frame(
  Id = seq(1, 10, by = 1),
  Age = sample(1:5, 10, replace = TRUE),
  Condition = sample(0:1, 10, replace = TRUE)
)

df$Trial[df$Condition == 1] <- sample(0:1, sum(df$Condition), prob = c(0.3, 0.7), replace = TRUE)
# more generally, if you want to assign to Trial only when Condition is x
# df$Trial[df$Condition == x] <- sample(0:1, sum(df$Condition == x), prob = c(0.3, 0.7), replace = TRUE)