I need a function f(B,A) that, given a dataset with the following structure,
T1 T2 T3 T4 T5 ... P1 P2 P3 P4 P5 ...
1 2 5 8 9 ... A C B B A ...
1 3 4 6 6 ... C A C A B ...
finds the first time B and A appear in Pj columns (starting with j=1) and returns the value difference in the corresponding Ti columns. For instance:
- in line 1: B appears in P3 first, A appears in P1 first. Then:
f(B, A) = T3 - T1 = 5-1 = 4
- in line 2: B appears in P5 first, A appears in P2 first. Then:
f(B, A) = T5 - T2 = 6-3 = 3
I can find in which Pj columns B and A appear using str_detect() , but I don't know how to "move" from P_j1, P_j2 to T_j1, T_j2.
Using datatable syntax (or base R) will be appreciated
CodePudding user response:
Here is a data.table
approach.
library(data.table)
DT <- fread("T1 T2 T3 T4 T5 P1 P2 P3 P4 P5
1 2 5 8 9 A C B B A
1 3 4 6 6 C A C A B")
# Add row ID's
DT[, id := .I]
#melt to a long format
DT.melt <- data.table::melt(DT,
id.vars = "id",
measure.vars = patterns(T = "^T", P = "^P"))
# Find first B for each id
val1 <- DT.melt[P == "B", T[1], by = .(id)]$V1
# [1] 5 6
# Find first A for each id
val2 <- DT.melt[P == "A", T[1], by = .(id)]$V1
# [1] 1 3
val1 - val2
# [1] 4 3
CodePudding user response:
base R
f <- function(l1, l2){
apply(df, 1, function(x){
dfP <- x[grepl("P", names(x))]
dfT <- x[grepl("T", names(x))]
as.numeric(dfT[which(dfP == l1)[1]]) - as.numeric(dfT[which(dfP == l2)[1]])
})
}
f("B", "A")
[1] 4 3
Tidyverse
With this type of data, it's usually best to pivot to long and then back to wide: here is a tidyverse
solution, with diff
being the desired output.
library(tidyverse)
df %>%
mutate(id = row_number()) %>%
pivot_longer(-id, names_pattern = "(\\D)(\\d)",
names_to = c(".value", "group")) %>%
group_by(id) %>%
mutate(diff = first(T[P == "B"]) - first(T[P == "A"])) %>%
pivot_wider(c(id, diff), names_from = group, values_from = c(T, P), names_sep = "")
output
id diff T1 T2 T3 T4 T5 P1 P2 P3 P4 P5
<int> <int> <int> <int> <int> <int> <int> <chr> <chr> <chr> <chr> <chr>
1 1 4 1 2 5 8 9 A C B B A
2 2 3 1 3 4 6 6 C A C A B