Home > database >  How do i combine several replicate wide format dataframes into a single long format dataframe?
How do i combine several replicate wide format dataframes into a single long format dataframe?

Time:11-02

i am having difficulties combining/mergin 6 replicate dataframes into a single long format dataframe. By replicates, I mean two data parameters measured from the same experiment repeated 3 times.

I have made small examples of what the data looks like:

GFP exp 1.

GFP1:

  time    species a    species b       species c . . .
1 0.24        -1518        -1566             163 . . .
2 0.72        -1408        -1533              43 . . .
3 0.96        -1460        -1573            -133 . . .
4 1.20        -1462        -1505            -398 . . .

(and GFP2 & GFP3 with same format but different values)

OD exp 1. OD1:

  time    species a    species b       species c . . . 
1 0.24        -0.03        -0.03           -0.01 . . .
2 0.72        -0.03        -0.03            0.00 . . .
3 0.96        -0.05        -0.04           -0.01 . . .
4 1.20        -0.04        -0.03            0.00 . . . 

(and then OD2 & OD3 with same format but different values)

This means i have 6 seperate datasets, with same amount of columns and rows but with different data values. What i would like to end up with is one long format dataframe that looks like this, preferably with an experiment id added:

   time  experiment      media    od   gfp
1  0.24           1  species a -0.03 -1518
2  0.24           2  species a -0.02 -1519
3  0.24           3  species a -0.02 -1242
4  0.72           1  species a -0.03 -1408
5  0.72           2  species a -0.04 -1470
6  0.72           3  species a -0.04 -1228
7  0.96           1  species a -0.05 -1460
8  0.96           2  species a -0.06 -1478
9  0.96           3  species a -0.05 -1214
10 1.20           1  species a -0.04 -1462
11 1.20           2  species a -0.05 -1402
12 1.20           3  species a -0.06 -1227

I hope everything makes sense, if not i will be happy to elaborate. Thanks in advance.

CodePudding user response:

Two ways to tackle this:

put all the datasets in a list and then bind them together then reshape:

lst(od, gfp) %>%
  bind_rows(.id = 'name')%>%
  reshape2::recast(time   variable ~ name, id.var = c('name', 'time'))

 time  variable   gfp    od
1  0.24 species_a -1518 -0.03
2  0.24 species_b -1566 -0.03
3  0.24 species_c   163 -0.01
4  0.72 species_a -1408 -0.03
5  0.72 species_b -1533 -0.03
6  0.72 species_c    43  0.00
7  0.96 species_a -1460 -0.05
8  0.96 species_b -1573 -0.04
9  0.96 species_c  -133 -0.01
10 1.20 species_a -1462 -0.04
11 1.20 species_b -1505 -0.03
12 1.20 species_c  -398  0.00

Second is an extension of the previous method:

lst(od, gfp)%>%
  imap(~pivot_longer(.,-time, values_to = .y))%>%
  reduce(full_join)

 A tibble: 12 × 4
    time name         od   gfp
   <dbl> <chr>     <dbl> <int>
 1  0.24 species_a -0.03 -1518
 2  0.24 species_b -0.03 -1566
 3  0.24 species_c -0.01   163
 4  0.72 species_a -0.03 -1408
 5  0.72 species_b -0.03 -1533
 6  0.72 species_c  0       43
 7  0.96 species_a -0.05 -1460
 8  0.96 species_b -0.04 -1573
 9  0.96 species_c -0.01  -133
10  1.2  species_a -0.04 -1462
11  1.2  species_b -0.03 -1505
12  1.2  species_c  0     -398

CodePudding user response:

To further illustrate my comment:

od <- read.table(text = "time    species_a    species_b       species_c
0.24        -0.03        -0.03           -0.01
0.72        -0.03        -0.03            0.00
0.96        -0.05        -0.04           -0.01
1.20        -0.04        -0.03            0.00", 
  header = TRUE, sep = "", check.names = F)
od


gfp <- read.table(text = "time    species_a    species_b       species_c
  0.24        -1518        -1566             163
  0.72        -1408        -1533              43
  0.96        -1460        -1573            -133
  1.20        -1462        -1505            -398", 
  header = TRUE, sep = "", check.names = F)
gfp


library(reshape2)
od2 <- melt(data = od, id.vars = "time", value.name = "od")
gfp2 <- melt(data = gfp, id.vars = "time", value.name = "gfp")

(df <- merge(x = od2, y = gfp2, all = TRUE))

#    time  variable    od   gfp
# 1  0.24 species_a -0.03 -1518
# 2  0.24 species_b -0.03 -1566
# 3  0.24 species_c -0.01   163
# 4  0.72 species_a -0.03 -1408
# 5  0.72 species_b -0.03 -1533
# 6  0.72 species_c  0.00    43
# 7  0.96 species_a -0.05 -1460
# 8  0.96 species_b -0.04 -1573
# 9  0.96 species_c -0.01  -133
# 10 1.20 species_a -0.04 -1462
# 11 1.20 species_b -0.03 -1505
# 12 1.20 species_c  0.00  -398

CodePudding user response:

Create one data frame for each experiment by row binding. Pivot each of these to long. Then merge the two dataframes by time, media, and replicate number.

library(tidyr)
library(dplyr) 

# note use of `.id` arg to create “replicate number” column
GFP <- bind_rows(GFP1, GFP2, GFP3, .id = "rep") %>%
  pivot_longer(
    starts_with("species"),
    names_to = "media",
    values_to = "gfp"
  )

OD <- bind_rows(OD1, OD2, OD3, .id = "rep") %>%
  pivot_longer(
    starts_with("species"),
    names_to = "media",
    values_to = "od"
  )

full_join(GFP, OD)
# Joining, by = c("rep", "time", "media")
# A tibble: 36 × 5
   rep    time media       gfp    od
   <chr> <dbl> <chr>     <dbl> <dbl>
 1 1      0.24 species_a -1518 -0.03
 2 1      0.24 species_b -1566 -0.03
 3 1      0.24 species_c   163 -0.01
 4 1      0.72 species_a -1408 -0.03
 5 1      0.72 species_b -1533 -0.03
 6 1      0.72 species_c    43  0   
 7 1      0.96 species_a -1460 -0.05
 8 1      0.96 species_b -1573 -0.04
 9 1      0.96 species_c  -133 -0.01
10 1      1.2  species_a -1462 -0.04
# … with 26 more rows
  • Related