I have the following dataset
df1 <- data.frame(
ACCESSION=c("123456","123456", "234567", "234567","345678", "345678","456789","456789"),
MRN=c("1", "1", "2", "2", "3", "3", "4", "4"),
SOURCE=c("BAL", "BAL", "BAL","BAL", "BAL", "BAL", "BW", "BW"),
TASK_TYPE=c("Stain", "Culture","Stain", "Culture","Stain", "Culture","Stain", "Culture"),
RESULT=c("Neg", "Pos","Neg", "Pos","Neg", "Pos","Neg", "Pos"))
But I would like to spread the TASK_TYPE
column into two columns and fill with the responses from the RESULT
column based on ACCESSION
(while keeping all other columns as is). So I would like it to look like this
df2 <- data.frame(
ACCESSION=c("123456","234567","345678", "456789"),
MRN=c("1", "2", "3", "4"),
SOURCE=c("BAL", "BAL", "BAL", "BW"),
STAIN=c("Neg","Neg","Neg","Neg"),
CULTURE=c("Pos","Pos","Pos","Pos"))
ACCESSION MRN SOURCE STAIN CULTURE
123456 1 BAL Neg Pos
234567 2 BAL Neg Pos
345678 3 BAL Neg Pos
456789 4 BW Neg Pos
I have looked everywhere and have come close but I haven't been able to find the exact solution. Thank you in advance!
CodePudding user response:
Use pivot_wider
from the tidyr
package:
library(tidyr)
df1 %>% pivot_wider(names_from = TASK_TYPE, values_from = RESULT)
Output:
# ACCESSION MRN SOURCE Stain Culture
# <chr> <chr> <chr> <chr> <chr>
# 1 123456 1 BAL Neg Pos
# 2 234567 2 BAL Neg Pos
# 3 345678 3 BAL Neg Pos
# 4 456789 4 BW Neg Pos
CodePudding user response:
Using dcast
library(data.table)
dcast(setDT(df1), ... ~ TASK_TYPE, value.var = 'RESULT')
-output
Key: <ACCESSION, MRN, SOURCE>
ACCESSION MRN SOURCE Culture Stain
<char> <char> <char> <char> <char>
1: 123456 1 BAL Pos Neg
2: 234567 2 BAL Pos Neg
3: 345678 3 BAL Pos Neg
4: 456789 4 BW Pos Neg
CodePudding user response:
A base R option using reshape
reshape(
df1,
direction = "wide",
idvar = c("ACCESSION", "MRN", "SOURCE"),
timevar = "TASK_TYPE"
)
gives
ACCESSION MRN SOURCE RESULT.Stain RESULT.Culture
1 123456 1 BAL Neg Pos
3 234567 2 BAL Neg Pos
5 345678 3 BAL Neg Pos
7 456789 4 BW Neg Pos