Loop in R not working, generating single Value-CodePudding

I have some metabolomics data I am trying to process (validate the compounds that are actually present).

`'data.frame':  544 obs. of  48 variables:
 $ X                : int  1 2 3 4 5 6 7 8 9 10 ...
 $ No.              : int  2 32 34 95 114 141 169 234 236 278 ...
 $ RT..min.         : num  0.89 3.921 0.878 2.396 0.845 ...
 $ Molecular.Weight : num  70 72 72 78 80 ...
 $ m.z              : num  103 145 114 120 113 ...
 $ HMDB.ID          : chr  "HMDB0006804" "HMDB0031647" "HMDB0006112" "HMDB0001505" ...
 $ Name             : chr  "Propiolic acid" "Acrylic acid" "Malondialdehyde" "Benzene" ...
 $ Formula          : chr  "C3H2O2" "C3H4O2" "C3H4O2" "C6H6" ...
 $ Monoisotopic_Mass: num  70 72 72 78 80 ...
 $ Delta.ppm.       : num  1.295 0.833 1.953 1.023 0.102 ...
 $ X1               : num  288.3 16.7 1130.9 3791.5 33.5 ...
 $ X2               : num  276.8 13.4 1069.1 3228.4 44.1 ...
 $ X3               : num  398.6 19.3 794.8 2153.2 15.8 ...
 $ X4               : num  247.6 100.5 1187.5 1791.4 33.4 ...
 $ X5               : num  98.4 162.1 1546.4 1646.8 45.3 ...`

I tried to write a loop so that if the Delta.ppm value is larger than (m/z - molecular weight)/molecular weight, the entire row is deleted in the subsequent dataframe.

for (i in 1:nrow(rawdata)) {
  ppm <- (rawdata$m.z[i] - rawdata$Molecular.Weight[i]) / 
rawdata$Molecular.Weight[i]

  if (ppm > rawdata$Delta.ppm[i]) {

    filtered_data <- rbind(filtered_data, rawdata[i,])
  }
}

Instead of giving me a new df with the validated compounds, under the 'Values' section, it generates a single number for 'ppm'.

Still very new to R, any help is super appreciated!

CodePudding user response：

No need to do this row-by-row, we can remove all undesired rows in one operation:

## base R
good <- with(rawdat, (m.z - Molecular.Weight)/Molecular.Weight < Delta.ppm.)
newdat <- rawdat[good, ]
## dplyr
newdat <- filter(rawdat, (m.z - Molecular.Weight)/Molecular.Weight < Delta.ppm.)

Iteratively adding rows to a frame using rbind(old, newrow) works in practice but scales horribly, see "Growing Objects" in The R Inferno. For each row added, it makes a complete copy of all rows in old, which works but starts to slow down a lot. It is far better to produce a list of these new rows and then rbind them at one time; e.g.,
```
out <- list()
for (...) {
  # ... newrow ...
  out <- c(out, list(newrow))
}
alldat <- do.call(rbind, out)
```

CodePudding user response：

ppm[i] <- NULL

for (i in 1:nrow(rawdata)) {
  ppm[i] <- (rawdata$m.z[i] - rawdata$Molecular.Weight[i]) / 
rawdata$Molecular.Weight[i]

  if (ppm[i] > rawdata$Delta.ppm[i]) {

    filtered_data <- rbind(filtered_data, rawdata[i,])
  }
}