Home > other >  Flattening/widening a dataset to show multiple trials of a single analyte in one row
Flattening/widening a dataset to show multiple trials of a single analyte in one row

Time:10-30

I'm trying to flatten/widen my data frame to sort by the Sample Name. Multiple trials have been completed on each individual sample, and I'd like to arrange all trials into a single row.

Sample data:

Sample_Name <- c("M1","M1","M1","M1","M2","M2","M2","M2")
test_ID <- c("Gen1 Spec1", "Gen2 Spec2", "Gen2 Spec2", "Gen2 Spec2", "Gen3 Spec3", "Gen3 Spec3", "Gen4 Spec4", "Gen4 Spec4")
MScore <- c(2.2, 1.9, 2.1, 2.0, 1.0, 2.0, 1.4, 1.5)
Test_Data <-data.frame(Sample_Name, test_ID, MScore)

Shape of my desired output:

Target_Sample_Name <-c("M1","M2")
Trial_1_ID <-c("Gen1 Spec1", "Gen2 Spec2")
Trial_1_Score <-c(2.2, 1.0)
Trial_2_ID<-c("Gen2 Spec2", "Gen3 Spec3")
Trial_2_Score<-c(1.9, 2.0)
Trial_3_ID<-c("Gen2 Spec2", "Gen4 Spec4" )
Trial_3_Score<-c(2.1 , 1.4)
Trial_4_ID<-c("Gen2 Spec2","Gen4 Spec4" )
Trial_4_Score<-c(2.0, 1.5 )
Desired_Output <- data.frame(Target_Sample_Name, Trial_1_ID, Trial_1_Score, Trial_2_ID, Trial_2_Score, Trial_3_ID, Trial_3_Score, Trial_4_ID, Trial_4_Score)

I'm sure there's a better way to actually show what I'm looking to do, but I'm super new and haven't found it yet.

I've tried to use aggregate, but I can't figure out what FUN to use. I've also tried using the tibble pivot_wider function but I couldn't make it work. I get that this is a kind of strange way to organize my data, but I promise it makes sense in the context of my project!

Thank you!

CodePudding user response:

You could use

library(dplyr)
library(tidyr)

Test_Data %>% 
  group_by(Sample_Name) %>% 
  mutate(rn = row_number()) %>% 
  pivot_wider(id_cols = Sample_Name,
              names_from = rn,
              names_glue = "{.value}_{rn}",
              values_from = c("test_ID", "MScore")) %>% 
  rename_with(~gsub("test_ID_(\\d )", "Trail_\\1_ID", .x), starts_with("test_ID")) %>% 
  rename_with(~gsub("MScore_(\\d )", "Trail_\\1_Score", .x), starts_with("MScore")) %>% 
  select(colnames(.)[order(colnames(.))]) %>%
  ungroup()

This returns

# A tibble: 2 x 9
  Sample_Name Trail_1_ID Trail_1_Score Trail_2_ID Trail_2_Score Trail_3_ID Trail_3_Score
  <chr>       <chr>              <dbl> <chr>              <dbl> <chr>              <dbl>
1 M1          Gen1 Spec1           2.2 Gen2 Spec2           1.9 Gen2 Spec2           2.1
2 M2          Gen3 Spec3           1   Gen3 Spec3           2   Gen4 Spec4           1.4
# ... with 2 more variables: Trail_4_ID <chr>, Trail_4_Score <dbl>
  • Related