I have a dataset named trainset and I'm trying to use a for loop to iterate through the specific columns and sums the values up, repeating for all rows in the dataset.
I firstly defined a function to return prediction by calculating the total score:
point = 0
m.gwtg = function(Systolic.BP, BUN, Sodium, Age, HR, COPD){
if (Systolic.BP>=200){
point = 0
}else if (Systolic.BP>= 190){
point = 2
}else if (Systolic.BP>= 180){
point = 4
}else if (Systolic.BP>= 170){
point = 6
}else if (Systolic.BP>= 160){
point = 8
}else if (Systolic.BP>= 150){
point = 9
}else if (Systolic.BP>= 140){
point = 11
}else if (Systolic.BP>= 130){
point = 13
}else if (Systolic.BP>= 120){
point = 15
}else if (Systolic.BP>= 110){
point = 17
}else if (Systolic.BP>= 100){
point = 19
}else if (Systolic.BP>= 90){
point = 21
}else if (Systolic.BP>= 80){
point = 23
}else if (Systolic.BP>= 70){
point = 24
}else if (Systolic.BP>= 60){
point = 26
}else if (Systolic.BP>= 50){
point = 28
}
if (BUN>=150){
point = point 28
}else if (BUN>= 140){
point = point 27
}else if (BUN>= 130){
point = point 25
}else if (BUN>= 120){
point = point 23
}else if (BUN>= 110){
point = point 21
}else if (BUN>= 100){
point = point 19
}else if (BUN>= 90){
point = point 17
}else if (BUN>= 80){
point = point 15
}else if (BUN>= 70){
point = point 13
}else if (BUN>= 60){
point = point 11
}else if (BUN>= 50){
point = point 9
}else if (BUN>= 40){
point = point 8
}else if (BUN>= 30){
point = point 6
}else if (BUN>= 20){
point = point 4
}else if (BUN>= 10){
point = point 2
}else if (BUN<= 9){
point = point 0
}
if (Sodium>=139){
point = point 0
}else if (Sodium>= 137){
point = point 1
}else if (Sodium>= 134){
point = point 2
}else if (Sodium>= 131){
point = point 3
}else if (Sodium<= 130){
point = point 4
}
if (Age>=110){
point = point 28
}else if (Age>= 100){
point = point 25
}else if (Age>= 90){
point = point 22
}else if (Age>= 80){
point = point 19
}else if (Age>= 70){
point = point 17
}else if (Age>= 60){
point = point 14
}else if (Age>= 50){
point = point 11
}else if (Age>= 40){
point = point 8
}else if (Age>= 30){
point = point 6
}else if (Age>= 20){
point = point 3
}else if (Age<= 19){
point = point 0
}
if (HR>=105){
point = point 8
}else if (HR>= 100){
point = point 6
}else if (HR>= 95){
point = point 5
}else if (HR>= 90){
point = point 4
}else if (HR>= 85){
point = point 3
}else if (HR>= 80){
point = point 1
}else if (HR<= 79){
point = point 0
}
if (COPD == 1){
point = point 2
} else {
point = point 0
}
if (point < 79){
outcome = 0
} else {
outcome = 1
}
}
Then I tried to code a for loop, which looks like this :
for (i in 1:nrow(trainset)) {
Systolic.BP[i] <- trainset$`Systolic blood pressure`[i]
BUN[i] <- trainset$`Urea nitrogen`[i]
Sodium[i] <- trainset$`Blood sodium`[i]
Age[i] <- trainset$age[i]
HR[i] <- trainset$`heart rate`[i]
COPD[i] <- trainset$COPD[i]
outcome.pred.gwtg[i]= m.gwtg(Systolic.BP[i], BUN[i], Sodium[i], Age[i], HR[i], COPD[i])
But when I actually got an error: Error: object 'Systolic.BP' not found
I'm actually quite confused on how to code a for loop to loop through the rows and columns. Anyone knows how to solve this problem? Thanks!
CodePudding user response:
The reason you are getting the error is that the first time the loop runs, the line
Systolic.BP[i] <- trainset$`Systolic blood pressure`[i]
Tries to write the first entry of trainset$'Systolic blood pressure'
into the first position of a vector called Systolic.BP
. But this vector doesn't exist yet.
If you are using the subsetting operator [
, you need to have the vector already defined. For example, I get an error if I do:
for(i in 1:10) {
x[i] <- i
}
#> Error: object 'x' not found
This is because x
doesn't exist when I try to write to its first position. The correct way to do this loop would be
x <- numeric(10)
for(i in 1:10) {
x[i] <- i
}
x
#> [1] 1 2 3 4 5 6 7 8 9 10
It's not clear to me why you need to write each variable separately for passing to the function inside the loop anyway - you could just do:
outcome.pred.gwtg <- numeric(nrow(trainset))
for (i in 1:nrow(trainset)) {
outcome.pred.gwtg[i] <- m.gwtg(trainset$`Systolic blood pressure`[i],
trainset$`Urea nitrogen`[i],
trainset$`Blood sodium`[i],
trainset$age[i],
trainset$`heart rate`[i],
trainset$COPD[i])
}
Another option, since you are only using the new variable names inside the loop, is to do:
outcome.pred.gwtg <- numeric(nrow(trainset))
for (i in 1:nrow(trainset)) {
Systolic.BP <- trainset$`Systolic blood pressure`[i]
BUN <- trainset$`Urea nitrogen`[i]
Sodium <- trainset$`Blood sodium`[i]
Age <- trainset$age[i]
HR <- trainset$`heart rate`[i]
COPD <- trainset$COPD[i]
outcome.pred.gwtg[i]= m.gwtg(Systolic.BP, BUN, Sodium, Age, HR, COPD)
}
Also, note that there's no point in filling vectors this way in the first place. You can do it outside the loop:
Systolic.BP <- trainset$`Systolic blood pressure`
BUN <- trainset$`Urea nitrogen`
Sodium <- trainset$`Blood sodium`
Age <- trainset$age
HR <- trainset$`heart rate`
COPD <- trainset$COPD
outcome.pred.gwtg <- numeric(nrow(trainset))
for (i in 1:nrow(trainset)) {
outcome.pred.gwtg[i]= m.gwtg(Systolic.BP[i], BUN[i], Sodium[i], Age[i], HR[i], COPD[i])
}
CodePudding user response:
Your function m.gwtg(...)
can't find the i'th vector element Systolic.BP[i] because you apparently haven't created the vector Systolic.BP itself before.
Anyhow: you're working with a data.frame
("trainset"), and there's a couple of more efficient ways to do this in R.
Example (using dplyr
):
library(dplyr)
trainset %>%
rename(
Systolic.BP = `Systolic blood pressure`,
## other renaming instructions
## of the form new_name = old_name ...
HR = `heart rate`
) %>%
rowwise %>%
mutate(
outcome.pred.gwtg = m.gwtg(Systolic.BP,
## other renamed predictors ...
COPD)
)