i am having difficulties combining/mergin 6 replicate dataframes into a single long format dataframe. By replicates, I mean two data parameters measured from the same experiment repeated 3 times.
I have made small examples of what the data looks like:
GFP exp 1.
GFP1:
time species a species b species c . . .
1 0.24 -1518 -1566 163 . . .
2 0.72 -1408 -1533 43 . . .
3 0.96 -1460 -1573 -133 . . .
4 1.20 -1462 -1505 -398 . . .
(and GFP2 & GFP3 with same format but different values)
OD exp 1. OD1:
time species a species b species c . . .
1 0.24 -0.03 -0.03 -0.01 . . .
2 0.72 -0.03 -0.03 0.00 . . .
3 0.96 -0.05 -0.04 -0.01 . . .
4 1.20 -0.04 -0.03 0.00 . . .
(and then OD2 & OD3 with same format but different values)
This means i have 6 seperate datasets, with same amount of columns and rows but with different data values. What i would like to end up with is one long format dataframe that looks like this, preferably with an experiment id added:
time experiment media od gfp
1 0.24 1 species a -0.03 -1518
2 0.24 2 species a -0.02 -1519
3 0.24 3 species a -0.02 -1242
4 0.72 1 species a -0.03 -1408
5 0.72 2 species a -0.04 -1470
6 0.72 3 species a -0.04 -1228
7 0.96 1 species a -0.05 -1460
8 0.96 2 species a -0.06 -1478
9 0.96 3 species a -0.05 -1214
10 1.20 1 species a -0.04 -1462
11 1.20 2 species a -0.05 -1402
12 1.20 3 species a -0.06 -1227
I hope everything makes sense, if not i will be happy to elaborate. Thanks in advance.
CodePudding user response:
Two ways to tackle this:
put all the datasets in a list and then bind them together then reshape:
lst(od, gfp) %>%
bind_rows(.id = 'name')%>%
reshape2::recast(time variable ~ name, id.var = c('name', 'time'))
time variable gfp od
1 0.24 species_a -1518 -0.03
2 0.24 species_b -1566 -0.03
3 0.24 species_c 163 -0.01
4 0.72 species_a -1408 -0.03
5 0.72 species_b -1533 -0.03
6 0.72 species_c 43 0.00
7 0.96 species_a -1460 -0.05
8 0.96 species_b -1573 -0.04
9 0.96 species_c -133 -0.01
10 1.20 species_a -1462 -0.04
11 1.20 species_b -1505 -0.03
12 1.20 species_c -398 0.00
Second is an extension of the previous method:
lst(od, gfp)%>%
imap(~pivot_longer(.,-time, values_to = .y))%>%
reduce(full_join)
A tibble: 12 × 4
time name od gfp
<dbl> <chr> <dbl> <int>
1 0.24 species_a -0.03 -1518
2 0.24 species_b -0.03 -1566
3 0.24 species_c -0.01 163
4 0.72 species_a -0.03 -1408
5 0.72 species_b -0.03 -1533
6 0.72 species_c 0 43
7 0.96 species_a -0.05 -1460
8 0.96 species_b -0.04 -1573
9 0.96 species_c -0.01 -133
10 1.2 species_a -0.04 -1462
11 1.2 species_b -0.03 -1505
12 1.2 species_c 0 -398
CodePudding user response:
To further illustrate my comment:
od <- read.table(text = "time species_a species_b species_c
0.24 -0.03 -0.03 -0.01
0.72 -0.03 -0.03 0.00
0.96 -0.05 -0.04 -0.01
1.20 -0.04 -0.03 0.00",
header = TRUE, sep = "", check.names = F)
od
gfp <- read.table(text = "time species_a species_b species_c
0.24 -1518 -1566 163
0.72 -1408 -1533 43
0.96 -1460 -1573 -133
1.20 -1462 -1505 -398",
header = TRUE, sep = "", check.names = F)
gfp
library(reshape2)
od2 <- melt(data = od, id.vars = "time", value.name = "od")
gfp2 <- melt(data = gfp, id.vars = "time", value.name = "gfp")
(df <- merge(x = od2, y = gfp2, all = TRUE))
# time variable od gfp
# 1 0.24 species_a -0.03 -1518
# 2 0.24 species_b -0.03 -1566
# 3 0.24 species_c -0.01 163
# 4 0.72 species_a -0.03 -1408
# 5 0.72 species_b -0.03 -1533
# 6 0.72 species_c 0.00 43
# 7 0.96 species_a -0.05 -1460
# 8 0.96 species_b -0.04 -1573
# 9 0.96 species_c -0.01 -133
# 10 1.20 species_a -0.04 -1462
# 11 1.20 species_b -0.03 -1505
# 12 1.20 species_c 0.00 -398
CodePudding user response:
Create one data frame for each experiment by row binding. Pivot each of these to long. Then merge the two dataframes by time, media, and replicate number.
library(tidyr)
library(dplyr)
# note use of `.id` arg to create “replicate number” column
GFP <- bind_rows(GFP1, GFP2, GFP3, .id = "rep") %>%
pivot_longer(
starts_with("species"),
names_to = "media",
values_to = "gfp"
)
OD <- bind_rows(OD1, OD2, OD3, .id = "rep") %>%
pivot_longer(
starts_with("species"),
names_to = "media",
values_to = "od"
)
full_join(GFP, OD)
# Joining, by = c("rep", "time", "media")
# A tibble: 36 × 5
rep time media gfp od
<chr> <dbl> <chr> <dbl> <dbl>
1 1 0.24 species_a -1518 -0.03
2 1 0.24 species_b -1566 -0.03
3 1 0.24 species_c 163 -0.01
4 1 0.72 species_a -1408 -0.03
5 1 0.72 species_b -1533 -0.03
6 1 0.72 species_c 43 0
7 1 0.96 species_a -1460 -0.05
8 1 0.96 species_b -1573 -0.04
9 1 0.96 species_c -133 -0.01
10 1 1.2 species_a -1462 -0.04
# … with 26 more rows