I have a problem and I am unable to find similar results (also because I am not sure how to word the search itself). I have a huge matrix with multiple columns. This matrix also has a column that specifies the "identity" of each line. I also have a second matrix with thresholds for each column for each identity. I can simplify the problem with the following code (although, take in mind that I have many many columns).
temp_df=data.frame(a_name=rep(c("A","B", "C"), 20), matrix(rnorm(40),nrow=60, ncol=2))
That contains:
> head(temp_df)
a_name X1 X2
1 A 0.31469191 -0.2763107
2 B -1.17477425 -0.2066650
3 C 0.10651550 0.1581325
4 A -1.88258477 -0.9280463
5 B -2.58260181 1.0322196
6 C 0.03027953 0.3110290
And the threshold matrix:
temp_limits=data.frame(a_name=c("A", "B", "C"), X1=c(1, 0.5, 0), x2=c(-1, -0.5, -0.25))
That contains:
> temp_limits
a_name X1 x2
1 A 1.0 -1.00
2 B 0.5 -0.50
3 C 0.0 -0.25
I could process the matrix as follow to obtain what I want:
res_df=NULL
for(i in unique(temp_df$a_name)){
tdf=temp_df[temp_df$a_name==i,]
a_lim=temp_limits[temp_limits$a_name==i,]
tdf[, 2:3]=sweep(as.matrix(tdf[, 2:3]), MARGIN=2, a_lim[, 2:3], FUN=">" )
res_df=rbind(res_df, tdf)
}
That gives what I want:
> head(res_df)
a_name X1 X2
1 A FALSE TRUE
4 A FALSE TRUE
7 A FALSE TRUE
10 A FALSE TRUE
13 A FALSE FALSE
16 A FALSE TRUE
But I believe this implementation is too long, cumbersome, and not optimal. I am thinking I should use group_by, by I am unsure on how to proceed. Any help?
CodePudding user response:
You can try this
cbind(temp_df[1], temp_df[-1] > temp_limits[match(temp_df$a_name, temp_limits$a_name), -1])
CodePudding user response:
We may use a join here
library(data.table)
setDT(temp_df)[temp_limits, .(a_name, X1 = X1 > i.X1, X2 = X2 > i.x2),
on = .(a_name)]