Im trying to create a synthetic dataset, but im struggling a bit
Is there a way to create a column based on the values in another column?
between subject design and my participant are dividend in two conditions (condition 1 = 0 condition 2 = 1). I want to make a column "Trial_1" = 0 = Absence, 1 = Presence, but just for the participants in one of the conditions?
df <- data.fram(
Id = seq(1, 10, by=1),
Age = sample(1:5, 10, replace = TRUE)
Condition = sample(0:1, 10, replace = TRUE)
Trial_1 = sample(0:1, 10, replace = TRUE, prob = c(0.3, 0.7)))
##BUT, want Trial_1 just do it for partisans' in in condition = 1
And if there is an easy way to make the probability based on age, that would be amazing!
Thanks in advance!
CodePudding user response:
You can create df
with Id
, Age
, Condition
columns first, and then use rowwise()
and mutate()
(both from dplyr
package) to create Trial_1
.
library(dplyr)
df %>%
rowwise() %>%
mutate(Trial_1 = sample(0:1, 1, prob=c(1-Age/10, Condition*Age/10)))
Here, note that the probability of 0 and 1 is 1-Age/10
and Age/10
, respectively, to make it age-dependent; you would want to change this to whatever dependence on age you would like.
Also, note that I multiply the probability corresponding to 1 by Condition
, ensuring that Condition=0
rows always get 0.
Output:
Id Age Condition Trial_1
<dbl> <int> <int> <int>
1 1 1 0 0
2 2 3 1 1
3 3 1 0 0
4 4 4 1 0
5 5 3 1 0
6 6 5 1 1
7 7 4 1 0
8 8 5 1 0
9 9 3 0 0
10 10 2 1 0
If you prefer those rows to be NA, then do something like this instead:
df %>%
rowwise() %>%
mutate(Trial_1 = if_else(Condition==1, sample(0:1, 1, prob=c(1-Age/10, Age/10)), NA_integer_))
Output:
Id Age Condition Trial_1
<dbl> <int> <int> <int>
1 1 1 0 NA
2 2 3 1 1
3 3 1 0 NA
4 4 4 1 1
5 5 3 1 0
6 6 5 1 1
7 7 4 1 0
8 8 5 1 0
9 9 3 0 NA
10 10 2 1 0
Input:
structure(list(Id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Age = c(1L,
3L, 1L, 4L, 3L, 5L, 4L, 5L, 3L, 2L), Condition = c(0L, 1L, 0L,
1L, 1L, 1L, 1L, 1L, 0L, 1L)), class = "data.frame", row.names = c(NA,
-10L))
CodePudding user response:
I'd do it in two steps - first create the dataframe and then the Trial column. My solution isn't super elegant, but it's straightforward and doesn't require anything but base R. I hope it helps.
df <- data.frame(
Id = seq(1, 10, by = 1),
Age = sample(1:5, 10, replace = TRUE),
Condition = sample(0:1, 10, replace = TRUE)
)
df$Trial[df$Condition == 1] <- sample(0:1, sum(df$Condition), prob = c(0.3, 0.7), replace = TRUE)
# more generally, if you want to assign to Trial only when Condition is x
# df$Trial[df$Condition == x] <- sample(0:1, sum(df$Condition == x), prob = c(0.3, 0.7), replace = TRUE)