I would like to be able to import 96 well plate data (likely a csv file but can change that) into R and have it as a long dataframe with wellID and value as column headers. I am trying to do calculations with 2 different timepoints so I think that it would be easiest to have one dataframe with wellID, t0 and t1. I would think that I would need to import 2 csv files and then combine to make a larger dataframe. I'm not sure, I'm new to using R so any help is much appreciated.
1 2 3 4 5 6
A 1 3 3 2 2 2
B 1 2 3 4 2 1
C 1 1 2 5 3 1
D 2 2 2 2 2 2
This is about a quarter of what I would get from one 96 well plate. I would like to read this into R to get it to look like
wellID value
A1 1
A2 3
A3 3
A4 2
A5 2
I will then have another 96 well plate to read in from a a different timepoint to have something like:
wellID value t0 value t1
A1 1 2
A2 3 4
A3 3 4
A4 2 3
A5 2 3
All the data that I have now is in the 96 well format, I would like it to be in a more usable format like the last example table that I have.
CodePudding user response:
I'm assuming each 96 well measurement (at a specific time) is in a separate .csv file. You can alter the below answer into a function if you find it works (you may need to amend the X1, X2 to reflect the actual column numbers depending on how your data reads in, the annotation hopefully makes this clear).
library(tidyverse)
df <- read.csv("yourDataFrame.csv")
# Rename the columns (mine always imports with the first column (the row designation) as i.. and the column numbers as X1, X2, etc.)
df <- df %>%
rename("row" = 1,# First column will be 'row'
"1" = "X1",# X1 will be renamed '1' (you may have to substitute 'X1' for 2 so it changes the second column, this approach just works given my imported dataset designates the second column as 'X1'. Note this column refers to '1' as per your plate)
"2" = "X2",
"3" = "X3",
"4" = "X4",
"5" = "X5",
"6" = "X6",
"7" = "X7",
"8" = "X8",
"9" = "X9",
"10" = "X10",
"11" = "X11",
"12" = "X12"
)
# Make it longer first so each row is an individual plate observation
df <- df %>%
pivot_longer((!row), names_to = "plateColumn", values_to = "OpticalDensity")
# Combine the first and second columns into well positions
df <- df %>%
mutate(wellID = paste0(row, plateColumn)) %>%
select(-c(row, plateColumn))
# Make your wells a factor (optional but makes sense given reality)
df$wellID <- as_factor(df$wellID)
Having got your first dataframe into shape you can then apply the same to the second dataframe (having read it in). When they're both in shape just bind them together and -select() whichever column you don't want (you'll have a duplicated wellID column). You can then move them around to read well ID, t0, t1 as you like.
It's a bit hacky, granted, but it works.
CodePudding user response:
Hacktacular42's answer pretty much addresses your question, but I have two additional comments that may be helpful:
Instead of having
t0
andt1
columns, it would be better to havetimepoint
andvalue
columns. For example:row col timepoint value A 1 0 1 A 2 0 3 A 3 0 3 A 4 0 2 A 5 0 2 A 1 1 2 A 2 1 4 A 3 1 4 A 4 1 3 A 5 1 3 This is a concept called "tidy data". If you aren't familiar, this article is a good introduction. The basic idea is that it's hard to work with data that's encoded in column names. To see why, imagine you measure two values for each well instead of just one (e.g. OD600 and temperature): now you need four columns total (two for each timepoint). Or imagine that you want to plot the values over time: since the timepoint numbers don't actually appear in the data frame, this isn't easy to do (especially if you end up adding more timepoints).
Loading the data into a dataframe that associates each measurement with a particular well is only the first step in your data analysis. The second step is to associate each well with the appropriate experimental conditions, e.g. cell type, drug concentration, or whatever's being varied in your experiment. This second step isn't trivial, especially if you want to be able to reuse your analysis scripts for different layouts without having to rewrite them each time. To help with this, I wrote a library called wellmap.
Here's an example of how you could use wellmap to load and label your data. For this, I'll assume that (i) you have 5 samples in wells A1-A5, (ii) each column represents a different drug concentration, (iii) the data for each timepoint are in different CSV files, and (iv) the CSV files are named
t0.csv
andt1.csv
, respectively. The first step is to make a file that describes the plate layout:# layout.toml [meta.paths] t0 = 't0.csv' t1 = 't1.csv' [plate] t0.timepoint = 0 t1.timepoint = 1 [col] 1.drug_conc = 8 2.drug_conc = 4 3.drug_conc = 2 4.drug_conc = 1 5.drug_conc = 0 [row.A]
The next step is to load a dataframe from this layout. There's a lot of detail I'm not explaining here (both with regards to the layout file and the
load()
function), but the documentation explains everything in detail:library(tidyverse) library(wellmapr) load_timepoint <- function(path) { read_csv(path) %>% rename(row = 1) %>% pivot_longer( !row, names_to = "col", values_to = "value", ) } df <- wellmapr::load( "layout.toml", data_loader = load_timepoint, merge_cols = TRUE, )
This script produces the following data frame. Note how each row specifies a well, both experimental conditions (drug concentration and timepoint), and the corresponding value from the CSV file. This will make it really easy to plot whatever relationship(s) you're interested in:
well well0 row col row_i col_j plate drug_conc timepoint value 0 A1 A01 A 1 0 0 t0 8 0 1 1 A2 A02 A 2 0 1 t0 4 0 3 2 A3 A03 A 3 0 2 t0 2 0 3 3 A4 A04 A 4 0 3 t0 1 0 2 4 A5 A05 A 5 0 4 t0 0 0 2 5 A1 A01 A 1 0 0 t1 8 1 2 6 A2 A02 A 2 0 1 t1 4 1 4 7 A3 A03 A 3 0 2 t1 2 1 4 8 A4 A04 A 4 0 3 t1 1 1 3 9 A5 A05 A 5 0 4 t1 0 1 3