Home > Net >  Creating a dataframe based on condition (with a probably dependent by age)?
Creating a dataframe based on condition (with a probably dependent by age)?

Time:09-29

Im trying to create a synthetic dataset, but im struggling a bit

Is there a way to create a column based on the values in another column?

between subject design and my participant are dividend in two conditions (condition 1 = 0 condition 2 = 1). I want to make a column "Trial_1" = 0 = Absence, 1 = Presence, but just for the participants in one of the conditions?

     df <- data.fram(
Id = seq(1, 10, by=1),
Age = sample(1:5, 10, replace = TRUE)
Condition = sample(0:1, 10, replace = TRUE)
Trial_1 = sample(0:1, 10, replace = TRUE, prob = c(0.3, 0.7)))
##BUT, want Trial_1 just do it for partisans' in in condition = 1 

And if there is an easy way to make the probability based on age, that would be amazing!

Thanks in advance!

CodePudding user response:

You can create df with Id, Age, Condition columns first, and then use rowwise() and mutate() (both from dplyr package) to create Trial_1.

library(dplyr)

df %>%
  rowwise() %>% 
  mutate(Trial_1 = sample(0:1, 1, prob=c(1-Age/10, Condition*Age/10)))

Here, note that the probability of 0 and 1 is 1-Age/10 and Age/10, respectively, to make it age-dependent; you would want to change this to whatever dependence on age you would like.

Also, note that I multiply the probability corresponding to 1 by Condition, ensuring that Condition=0 rows always get 0.

Output:

      Id   Age Condition Trial_1
   <dbl> <int>     <int>   <int>
 1     1     1         0       0
 2     2     3         1       1
 3     3     1         0       0
 4     4     4         1       0
 5     5     3         1       0
 6     6     5         1       1
 7     7     4         1       0
 8     8     5         1       0
 9     9     3         0       0
10    10     2         1       0

If you prefer those rows to be NA, then do something like this instead:

df %>%
  rowwise() %>% 
  mutate(Trial_1 = if_else(Condition==1, sample(0:1, 1, prob=c(1-Age/10, Age/10)), NA_integer_))

Output:

      Id   Age Condition Trial_1
   <dbl> <int>     <int>   <int>
 1     1     1         0      NA
 2     2     3         1       1
 3     3     1         0      NA
 4     4     4         1       1
 5     5     3         1       0
 6     6     5         1       1
 7     7     4         1       0
 8     8     5         1       0
 9     9     3         0      NA
10    10     2         1       0

Input:

structure(list(Id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Age = c(1L, 
3L, 1L, 4L, 3L, 5L, 4L, 5L, 3L, 2L), Condition = c(0L, 1L, 0L, 
1L, 1L, 1L, 1L, 1L, 0L, 1L)), class = "data.frame", row.names = c(NA, 
-10L))

CodePudding user response:

I'd do it in two steps - first create the dataframe and then the Trial column. My solution isn't super elegant, but it's straightforward and doesn't require anything but base R. I hope it helps.

df <- data.frame(
  Id = seq(1, 10, by = 1),
  Age = sample(1:5, 10, replace = TRUE),
  Condition = sample(0:1, 10, replace = TRUE)
)

df$Trial[df$Condition == 1] <- sample(0:1, sum(df$Condition), prob = c(0.3, 0.7), replace = TRUE)
# more generally, if you want to assign to Trial only when Condition is x
# df$Trial[df$Condition == x] <- sample(0:1, sum(df$Condition == x), prob = c(0.3, 0.7), replace = TRUE)
  • Related