I have a data frame as the structure below:
head(test)
geneA geneB start end position
1 Ypc1 Malat1 34 59 36
2 Ypc1 Malat1 35 60 26
3 Ypc1 Malat1 34 59 60
I want to add a new column called as distance
based on conditional math operations on the three columns which are start
, end
and position
. I used the if statements as below but I constantly get 0
for the distance
column. After if statements my output looks like this:
if (test$position < test$start) {
test$distance <- test$start - test$position
} else if (test$position >= test$start & test$position <= test$end) {
test$distance <- 0
} else if (test$position > test$end) {
test$distance <- test$end - test$position
}
head(test)
geneA geneB start end position distance
1 Ypc1 Malat1 34 59 36 0
2 Ypc1 Malat1 35 60 26 0
3 Ypc1 Malat1 34 59 60 0
The desired output should be:
geneA geneB start end position distance
1 Ypc1 Malat1 34 59 36 0
2 Ypc1 Malat1 35 60 26 9
3 Ypc1 Malat1 34 59 60 -1
How can I do this?
Thank you in advance.
CodePudding user response:
When testing condition along a vector, you should use ifelse
.
I corrected your code below :
test <- data.frame(geneA = c("Ypc1"), geneB = c("Malat1"),
start = c(34, 35, 34),
end = c(59, 60, 59),
position = c(36, 26, 60))
test$distance <- ifelse(
test$position < test$start,
test$start - test$position,
ifelse(
test$position > test$end,
test$end - test$position,
0
))
test
# geneA geneB start end position distance
# 1 Ypc1 Malat1 34 59 36 0
# 2 Ypc1 Malat1 35 60 26 9
# 3 Ypc1 Malat1 34 59 60 -1
Your code won't work because the replace the full column distance on the first evaluation, which return 0.
However this is not very understable, I'll look for a shorter way to compute this !