I'm trying to flatten/widen my data frame to sort by the Sample Name. Multiple trials have been completed on each individual sample, and I'd like to arrange all trials into a single row.
Sample data:
Sample_Name <- c("M1","M1","M1","M1","M2","M2","M2","M2")
test_ID <- c("Gen1 Spec1", "Gen2 Spec2", "Gen2 Spec2", "Gen2 Spec2", "Gen3 Spec3", "Gen3 Spec3", "Gen4 Spec4", "Gen4 Spec4")
MScore <- c(2.2, 1.9, 2.1, 2.0, 1.0, 2.0, 1.4, 1.5)
Test_Data <-data.frame(Sample_Name, test_ID, MScore)
Shape of my desired output:
Target_Sample_Name <-c("M1","M2")
Trial_1_ID <-c("Gen1 Spec1", "Gen2 Spec2")
Trial_1_Score <-c(2.2, 1.0)
Trial_2_ID<-c("Gen2 Spec2", "Gen3 Spec3")
Trial_2_Score<-c(1.9, 2.0)
Trial_3_ID<-c("Gen2 Spec2", "Gen4 Spec4" )
Trial_3_Score<-c(2.1 , 1.4)
Trial_4_ID<-c("Gen2 Spec2","Gen4 Spec4" )
Trial_4_Score<-c(2.0, 1.5 )
Desired_Output <- data.frame(Target_Sample_Name, Trial_1_ID, Trial_1_Score, Trial_2_ID, Trial_2_Score, Trial_3_ID, Trial_3_Score, Trial_4_ID, Trial_4_Score)
I'm sure there's a better way to actually show what I'm looking to do, but I'm super new and haven't found it yet.
I've tried to use aggregate, but I can't figure out what FUN to use. I've also tried using the tibble pivot_wider function but I couldn't make it work. I get that this is a kind of strange way to organize my data, but I promise it makes sense in the context of my project!
Thank you!
CodePudding user response:
You could use
library(dplyr)
library(tidyr)
Test_Data %>%
group_by(Sample_Name) %>%
mutate(rn = row_number()) %>%
pivot_wider(id_cols = Sample_Name,
names_from = rn,
names_glue = "{.value}_{rn}",
values_from = c("test_ID", "MScore")) %>%
rename_with(~gsub("test_ID_(\\d )", "Trail_\\1_ID", .x), starts_with("test_ID")) %>%
rename_with(~gsub("MScore_(\\d )", "Trail_\\1_Score", .x), starts_with("MScore")) %>%
select(colnames(.)[order(colnames(.))]) %>%
ungroup()
This returns
# A tibble: 2 x 9
Sample_Name Trail_1_ID Trail_1_Score Trail_2_ID Trail_2_Score Trail_3_ID Trail_3_Score
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 M1 Gen1 Spec1 2.2 Gen2 Spec2 1.9 Gen2 Spec2 2.1
2 M2 Gen3 Spec3 1 Gen3 Spec3 2 Gen4 Spec4 1.4
# ... with 2 more variables: Trail_4_ID <chr>, Trail_4_Score <dbl>