I am trying to solve a problem: to create a new variable with values 0 or 1 to define if a data frame contains a value stored in one column. I asked this question previously (Check if subset contains said value) and I got a tip to use setDT() function. It works, but I have trouble using it for two conditions. To explain briefly:
- I have two data frames:
###df
sub_ID <- c("sub01", "sub01", "sub01", "sub01", "sub01", "sub01")
condition <- c("A", "A", "A", "B", "B", "B")
number <- c(1,2,3,1,2,3)
df <- data.frame(sub_ID, condition, number)
###test_df
test_sub_ID <- c("sub01", "sub01", "sub01", "sub01", "sub01", "sub01")
test_condition <- c("A", "A", "B", "B", "C", "C")
test_number <- c(2,7,1,4,8,9)
test_df <- data.frame(test_sub_ID, test_condition, test_number)
I want to check row-wise, if each number from df is present in test_df, according to two features. So - if in df I have the first row, the number is 1, I want to check if in test_df there was 1 for sub01, in condition A, together.
I got the answer that I should use this function: setDT(df)[test_df, new := number %in% test_number, on = .(sub_ID = test_sub_ID, condition = test_condition)]
Thank you, @akrun!
- The problem is I want to have two conditions included at the same time, meanwhile, it seems not to work as I want it to.
Do you have any ideas on how to solve this problem?
CodePudding user response:
You could loop over the rows of df
using subset in apply
and count the nrow
.
df$test <- apply(df, 1, \(x) nrow(subset(test_df, test_sub_ID == x[1] &
test_condition == x[2] &
test_number == x[3])))
df
# sub_ID condition number test
# 1 sub01 A 1 0
# 2 sub01 A 2 1
# 3 sub01 A 3 0
# 4 sub01 B 1 1
# 5 sub01 B 2 0
# 6 sub01 B 3 0
This is similar to this for
loop which should help you to understand the code and how to write such code by yourself. Use the apply
above though because it is much faster!
df$test <- NA
for (i in seq_len(nrow(df))) {
df$test[i] <- nrow(subset(test_df, test_sub_ID == df[i, 1] &
test_condition == df[i, 2] &
test_number == df[i, 3]))
}
CodePudding user response:
You can extend your data.table
join approach:
setnames(setDT(test_df), new=names(df))
test_df[setDT(df), j=.N, by=.EACHI, on=.(sub_ID, condition, number)]
Output:
sub_ID condition number test
1: sub01 A 1 0
2: sub01 A 2 1
3: sub01 A 3 0
4: sub01 B 1 1
5: sub01 B 2 0
6: sub01 B 3 0
Note: I use setnames()
as a convenience only: Making the names of test_df
the same as the names of df
facilitates a more concise on=
clause.