I would like to find the combination of some value in a dataframe from one column and some row with the value from the same column and the previous row. Here is a reproducible example.
I have four possible conditions and I want to identify pairs of (two) successive rows.
library(tidyverse)
library(purrr)
df <- data.frame(
condition = sample(
x = c("IL", "IR", "EL", "ER"), size = 10, replace = TRUE
)
)
df
condition
1 IR
2 EL
3 IR
4 EL
5 ER
6 IL
7 IR
8 EL
9 EL
10 EL
In other words, I would like to get a "pair" column such as:
condition pair
1 IR NA
2 EL IR-EL
3 IR EL-IR
4 EL IR-EL
5 ER EL-ER
6 IL ER-IL
7 IR IL-IR
8 EL IR-EL
9 EL EL-EL
10 EL EL-EL
I have tried using purrr::accumulate() in combination with paste() but it accumulates all previous values. How can I only keep the previous and current values (i.e., lag 1)?
df %>%
mutate(
pair = accumulate(
.x = condition,
.f = ~paste(.x, .y, sep = "-")
)
)
condition pair
1 IL IL
2 ER IL-ER
3 IL IL-ER-IL
4 IL IL-ER-IL-IL
5 EL IL-ER-IL-IL-EL
6 EL IL-ER-IL-IL-EL-EL
7 IL IL-ER-IL-IL-EL-EL-IL
8 IL IL-ER-IL-IL-EL-EL-IL-IL
9 IL IL-ER-IL-IL-EL-EL-IL-IL-IL
10 IR IL-ER-IL-IL-EL-EL-IL-IL-IL-IR
I only want to keep pairs of two successive trials. Any idea? NB: I would prefer a tidyverse option.
CodePudding user response:
Use lag
.
df %>%
mutate(pair = paste(lag(condition),condition,sep="-"))
condition pair
1 ER NA-ER
2 EL ER-EL
3 IR EL-IR
4 ER IR-ER
5 ER ER-ER
6 IR ER-IR
7 IR IR-IR
8 IR IR-IR
9 IR IR-IR
10 EL IR-EL