Home > Software design >  In R, create a new column based on the order of a 2nd column grouped by a 3rd column
In R, create a new column based on the order of a 2nd column grouped by a 3rd column

Time:03-01

This is very similar to some other questions, but I wasn't quite satisfied with the other answers.

I have data where one column is the outcome of a Latin Square study design, where a participant had three conditions that could have come in six possible orders. I do not have a variable that indicates the order that the participant actually received the study conditions, and so need to create one myself. Here is my current and desired output using a fake example from the first three participants:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
(current <- tibble(
    participant = c(1,1,1,2,2,2,3,3,3),
    block_code = c("timed", "untimed", "practice", "untimed", "practice", "timed", "timed", "untimed", "practice")
    ))
#> # A tibble: 9 × 2
#>   participant block_code
#>         <dbl> <chr>     
#> 1           1 timed     
#> 2           1 untimed   
#> 3           1 practice  
#> 4           2 untimed   
#> 5           2 practice  
#> 6           2 timed     
#> 7           3 timed     
#> 8           3 untimed   
#> 9           3 practice
(desired <- current %>%
    mutate(order_code = c(rep("tup", 3), rep("upt", 3), rep("tup", 3))))
#> # A tibble: 9 × 3
#>   participant block_code order_code
#>         <dbl> <chr>      <chr>     
#> 1           1 timed      tup       
#> 2           1 untimed    tup       
#> 3           1 practice   tup       
#> 4           2 untimed    upt       
#> 5           2 practice   upt       
#> 6           2 timed      upt       
#> 7           3 timed      tup       
#> 8           3 untimed    tup       
#> 9           3 practice   tup

Created on 2022-02-28 by the reprex package (v2.0.1)

Participants 1 and 3 had the same order, so they ended up with the same code.

How can I tell R to create a new column based on the order of the block_code variable within a participant?

CodePudding user response:

You can group_by(participant), then create order_code by collapsing the first letter of each block_code:

library(tidyverse)

(current %>% 
  group_by(participant) %>% 
  mutate(order_code = str_c(str_sub(block_code, end = 1), collapse = "")) %>% 
  ungroup())
#> # A tibble: 9 x 3
#>   participant block_code order_code
#>         <dbl> <chr>      <chr>     
#> 1           1 timed      tup       
#> 2           1 untimed    tup       
#> 3           1 practice   tup       
#> 4           2 untimed    upt       
#> 5           2 practice   upt       
#> 6           2 timed      upt       
#> 7           3 timed      tup       
#> 8           3 untimed    tup       
#> 9           3 practice   tup

Created on 2022-02-28 by the reprex package (v2.0.1)

CodePudding user response:

Another slightly different option is to use summarise so that you can drop the grouping without having to ungroup. Here, we group by the participant, then collapse together only the first letter for each group.

library(tidyverse)

current %>%
  group_by(participant) %>%
  summarise(
    block_code,
    order_code = paste(substr(block_code, 0, 1), collapse = ""),
    .groups = "drop"
  )

Output

  participant block_code order_code
        <dbl> <chr>      <chr>     
1           1 timed      tup       
2           1 untimed    tup       
3           1 practice   tup       
4           2 untimed    upt       
5           2 practice   upt       
6           2 timed      upt       
7           3 timed      tup       
8           3 untimed    tup       
9           3 practice   tup    

Or with data.table:

library("data.table")
dt <- as.data.table(current)

dt[, order_code := paste(substr(block_code, 0, 1), collapse = ""), by = participant]

Or with base R:

merge(current, setNames(
  aggregate(
    block_code ~ participant,
    data = current,
    FUN = \(x) paste(substr(x, 0, 1), collapse = "")
  ),
  c("participant", "order_code")
), by = "participant")
  • Related