Home > database >  In R, is there a way to "normalize" timepoint data as a ratio of starting value?
In R, is there a way to "normalize" timepoint data as a ratio of starting value?

Time:12-25

Let's say I have the following table of data, imported from Excel:

glasses_of_water <- tribble(
  ~glass,   ~hours,  ~mass,  ~volume,   ~temp,
  #-------|--------|--------|---------|--------|
  "A",       "0",     500,      5,        20,
  "B",       "0",     500,      5,        20,
  "C",       "0",     500,      5,        20,
  "B",       "10",    450,      5,        22,
  "C",       "10",    250,    2.5,        20,
  "A",       "10",    400,      4,        21,
  "A",       "30",    200,      1,        23,
  "B",       "30",    350,    3.5,        26,
  "C",       "30",      0,      0,        20,
  "C",       "20",      0,      0,        20,
  "B",       "20",    400,      4,        24,
  "A",       "20",    300,      3,        22,
  "A",       "42",    200,      1,        23,
  "B",       "44",    350,    3.5,        26,
  "C",       "46",      0,      0,        20,
)

Three glasses of water were analyzed at timepoints from 0 hours up to around 50 hours. I want to make line graphs for each measured parameter (mass, volume, temp), where I can see how these parameters change over time -- however, I want to view each parameter as a ratio of its starting value, where the 0-hour values are 1.0 and then all future values are compared it.

For instance, glass C begins at a mass of 1.0, decreases to 0.5 at the 10-hour timepoint, then becomes 0.0 at the 20-hour timepoint.

Currently I have the data stored in an Excel workbook. I update the data with new measurements every few days, and I have a separate sheet just for the "normalized" values, where I manually divide each measurement by that subject's measurement at zero hours. This is time-consuming and I'd like to do this via R script, if possible.

I use the tidyverse suite of packages a lot, and I've tried something like...

data0 <- filter(glasses_of_water, hours == "0")
data <- glasses_of_water

ggplot(data)   geom_line(mapping = aes(x = data$hours, y = data$mass/data0$mass, group = glass, color = glass))

...where I created one object for the zero timepoint, and divided all data by these values. This works in this simplistic instance, but seems to fall apart more often than not, especially with bigger data sets that get separated by various factors, and with facet_grids, etc. I have a few data sets with thousands of rows and like 50 columns. My actual data is not in tribble format, it's simply an Excel import; could that be an issue?

At any rate, my question is: is there a simpler way to do this? Any help would be greatly appreciated.

CodePudding user response:

An approach to get each group normalized separately.

library(dplyr)
library(ggplot2)

glasses_of_water %>% 
  group_by(glass) %>% 
  mutate(nmass = mass / mass[hours == "0"]) %>% 
  ggplot()   
    geom_line(aes(hours, nmass, group = glass, col = glass))
  • Related