I have some metabolomics data I am trying to process (validate the compounds that are actually present).
`'data.frame': 544 obs. of 48 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ No. : int 2 32 34 95 114 141 169 234 236 278 ...
$ RT..min. : num 0.89 3.921 0.878 2.396 0.845 ...
$ Molecular.Weight : num 70 72 72 78 80 ...
$ m.z : num 103 145 114 120 113 ...
$ HMDB.ID : chr "HMDB0006804" "HMDB0031647" "HMDB0006112" "HMDB0001505" ...
$ Name : chr "Propiolic acid" "Acrylic acid" "Malondialdehyde" "Benzene" ...
$ Formula : chr "C3H2O2" "C3H4O2" "C3H4O2" "C6H6" ...
$ Monoisotopic_Mass: num 70 72 72 78 80 ...
$ Delta.ppm. : num 1.295 0.833 1.953 1.023 0.102 ...
$ X1 : num 288.3 16.7 1130.9 3791.5 33.5 ...
$ X2 : num 276.8 13.4 1069.1 3228.4 44.1 ...
$ X3 : num 398.6 19.3 794.8 2153.2 15.8 ...
$ X4 : num 247.6 100.5 1187.5 1791.4 33.4 ...
$ X5 : num 98.4 162.1 1546.4 1646.8 45.3 ...`
I tried to write a loop so that if the Delta.ppm value is larger than (m/z - molecular weight)/molecular weight, the entire row is deleted in the subsequent dataframe.
for (i in 1:nrow(rawdata)) {
ppm <- (rawdata$m.z[i] - rawdata$Molecular.Weight[i]) /
rawdata$Molecular.Weight[i]
if (ppm > rawdata$Delta.ppm[i]) {
filtered_data <- rbind(filtered_data, rawdata[i,])
}
}
Instead of giving me a new df with the validated compounds, under the 'Values' section, it generates a single number for 'ppm'.
Still very new to R, any help is super appreciated!
CodePudding user response:
No need to do this row-by-row, we can remove all undesired rows in one operation:
## base R good <- with(rawdat, (m.z - Molecular.Weight)/Molecular.Weight < Delta.ppm.) newdat <- rawdat[good, ] ## dplyr newdat <- filter(rawdat, (m.z - Molecular.Weight)/Molecular.Weight < Delta.ppm.)
Iteratively adding rows to a frame using
rbind(old, newrow)
works in practice but scales horribly, see "Growing Objects" in The R Inferno. For each row added, it makes a complete copy of all rows inold
, which works but starts to slow down a lot. It is far better to produce a list of these new rows and thenrbind
them at one time; e.g.,out <- list() for (...) { # ... newrow ... out <- c(out, list(newrow)) } alldat <- do.call(rbind, out)
CodePudding user response:
ppm[i] <- NULL
for (i in 1:nrow(rawdata)) {
ppm[i] <- (rawdata$m.z[i] - rawdata$Molecular.Weight[i]) /
rawdata$Molecular.Weight[i]
if (ppm[i] > rawdata$Delta.ppm[i]) {
filtered_data <- rbind(filtered_data, rawdata[i,])
}
}