I have a dataset like this, where Count is the sequence of 1s for each participant:
Participant TrialNumber Correct Count
118 1 1 1
118 2 1 2
118 3 1 3
118 4 1 4
118 5 1 5
120 1 1 1
120 2 0 0
120 3 0 0
120 4 1 1
120 5 1 2
121 1 1 1
121 2 1 2
121 3 1 3
121 4 1 4
121 5 0 0
I need to find all trial numbers where the count is 4, and if a participant doesn't have a count of 4 (i.e. participant 120) to write 0 instead.
This is the code I have so far:
df<-df[with(df, !(Count> 4)),]
df$Performance<- ifelse(df$Count==4, df$TrialNumber, NA)
df_Performance<-aggregate(Performance~Participant, data=df, first)
The only problem with this, is that I lose any participants who don't have a count of 4.
I'm thinking I need to add another ifelse statement to the second line of code saying if there are no 4s for that participant then write 0 for the first trial, then for any other trials write NA.
I'm stuck on how to write this next part of the statement, any suggestions on how to do this would be much appreciated, thank you
CodePudding user response:
library(tidyverse)
data <-
tibble::tribble(
~Participant, ~TrialNumber, ~Correct, ~Count,
118, 1, 1, 1,
118, 2, 1, 2,
118, 3, 1, 3,
118, 4, 1, 4,
118, 5, 1, 5,
120, 1, 1, 1,
120, 2, 0, 0,
120, 3, 0, 0,
120, 4, 1, 1,
120, 5, 1, 2,
121, 1, 1, 1,
121, 2, 1, 2,
121, 3, 1, 3,
121, 4, 1, 4,
121, 5, 0, 0
)
data %>%
group_by(Participant) %>%
mutate(
has_four = 4 %in% Count
) %>%
distinct(has_four) %>%
transmute(
Participant,
performance = ifelse(has_four, 4, 0)
)
#> # A tibble: 3 × 2
#> # Groups: Participant [3]
#> Participant performance
#> <dbl> <dbl>
#> 1 118 4
#> 2 120 0
#> 3 121 4
Created on 2021-09-16 by the reprex package (v2.0.1)
CodePudding user response:
Here is another tidyverse
approach. After grouping by Participant
, use summarise
and designate the new column to include the TrialNumber
where the Count
is 4. If none of the trials have a 4 (using any
), then include 0. This would assume you have only one trial of Count
equaling 4 for any participant.
library(tidyverse)
df %>%
group_by(Participant) %>%
summarise(Performance = ifelse(any(Count == 4), TrialNumber[Count == 4], 0))
Output
Participant Performance
<int> <dbl>
1 118 4
2 120 0
3 121 4
Data
df <- structure(list(Participant = c(118L, 118L, 118L, 118L, 118L,
120L, 120L, 120L, 120L, 120L, 121L, 121L, 121L, 121L, 121L),
TrialNumber = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L), Correct = c(1L, 1L, 1L, 1L, 1L, 1L, 0L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L), Count = c(1L, 2L, 3L, 4L,
5L, 1L, 0L, 0L, 1L, 2L, 1L, 2L, 3L, 4L, 0L)), class = "data.frame", row.names = c(NA,
-15L))
CodePudding user response:
Here's a base R approach.
data$Performance <- ifelse(data$Count == 4, data$TrialNumber, 0)
aggregate(x = data$Performance, by = list(Participant = data$Participant), FUN = sum)
#> Participant x
#> 1 118 4
#> 2 120 0
#> 3 121 4
CodePudding user response:
Using base R aggregate
with if
/else
.
aggregate(Count~Participant, df, function(x) if(any(x == 4)) 4 else 0)
# Participant Count
#1 118 4
#2 120 0
#3 121 4