This is very similar to some other questions, but I wasn't quite satisfied with the other answers.
I have data where one column is the outcome of a Latin Square study design, where a participant had three conditions that could have come in six possible orders. I do not have a variable that indicates the order that the participant actually received the study conditions, and so need to create one myself. Here is my current and desired output using a fake example from the first three participants:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
(current <- tibble(
participant = c(1,1,1,2,2,2,3,3,3),
block_code = c("timed", "untimed", "practice", "untimed", "practice", "timed", "timed", "untimed", "practice")
))
#> # A tibble: 9 × 2
#> participant block_code
#> <dbl> <chr>
#> 1 1 timed
#> 2 1 untimed
#> 3 1 practice
#> 4 2 untimed
#> 5 2 practice
#> 6 2 timed
#> 7 3 timed
#> 8 3 untimed
#> 9 3 practice
(desired <- current %>%
mutate(order_code = c(rep("tup", 3), rep("upt", 3), rep("tup", 3))))
#> # A tibble: 9 × 3
#> participant block_code order_code
#> <dbl> <chr> <chr>
#> 1 1 timed tup
#> 2 1 untimed tup
#> 3 1 practice tup
#> 4 2 untimed upt
#> 5 2 practice upt
#> 6 2 timed upt
#> 7 3 timed tup
#> 8 3 untimed tup
#> 9 3 practice tup
Created on 2022-02-28 by the reprex package (v2.0.1)
Participants 1 and 3 had the same order, so they ended up with the same code.
How can I tell R to create a new column based on the order of the block_code
variable within a participant?
CodePudding user response:
You can group_by(participant)
, then create order_code
by collapsing the first letter of each block_code
:
library(tidyverse)
(current %>%
group_by(participant) %>%
mutate(order_code = str_c(str_sub(block_code, end = 1), collapse = "")) %>%
ungroup())
#> # A tibble: 9 x 3
#> participant block_code order_code
#> <dbl> <chr> <chr>
#> 1 1 timed tup
#> 2 1 untimed tup
#> 3 1 practice tup
#> 4 2 untimed upt
#> 5 2 practice upt
#> 6 2 timed upt
#> 7 3 timed tup
#> 8 3 untimed tup
#> 9 3 practice tup
Created on 2022-02-28 by the reprex package (v2.0.1)
CodePudding user response:
Another slightly different option is to use summarise
so that you can drop the grouping without having to ungroup
. Here, we group by the participant
, then collapse together only the first letter for each group.
library(tidyverse)
current %>%
group_by(participant) %>%
summarise(
block_code,
order_code = paste(substr(block_code, 0, 1), collapse = ""),
.groups = "drop"
)
Output
participant block_code order_code
<dbl> <chr> <chr>
1 1 timed tup
2 1 untimed tup
3 1 practice tup
4 2 untimed upt
5 2 practice upt
6 2 timed upt
7 3 timed tup
8 3 untimed tup
9 3 practice tup
Or with data.table
:
library("data.table")
dt <- as.data.table(current)
dt[, order_code := paste(substr(block_code, 0, 1), collapse = ""), by = participant]
Or with base R:
merge(current, setNames(
aggregate(
block_code ~ participant,
data = current,
FUN = \(x) paste(substr(x, 0, 1), collapse = "")
),
c("participant", "order_code")
), by = "participant")