For this data, we are looking at infection present in the udders of sheep. Each animal has a unique ID (in the column: EweID), a date sampled, udder half (R or L in this case), a sample type (BC1 or BC2), Bacteria1_ID (the name of identified bacteria in each sample), Bacteria1_Level (how much of the bacteria is present in the animal, numeric 1-4), Bacteria2_ID, and Bacteria2_Level.
The data frame has 20 rows and the following columns:
EweID DateSampled Sample UdderHalf Bacteria1_ID Bacteria1_Level
numeric date format char str char char. str. numeric
1 . BC2 L No Growth NA
1 . BC1 L Staph Aureus 3
2 . BC2 L Staph Equorum 4
2 . BC1 L Staph Oralis 2
3 . BC2 L NA NA
3 . BC1 L NA NA
4 . BC2 R NA NA
4 . BC1 R NA NA
5 . BC2 R NA NA
5 . BC1 R NA NA
The text box cuts off Bacteria2_ID and Bacteria2_Level but those columns are the last two columns from the right hand side.
I used the following code to convert from long to wide format. The goal of doing this would be to get the following new columns to replace the current bacteria columns:
Code:
MDFSO <- reshape(data=Data,
idvar= c("EweID","DateSampled", "UdderHalf"),
v.names = c("Bacteria1_ID","Bacteria1_Level", "Bacteria2_ID"),
timevar = "Sample",
direction="wide")
Desired Result Columns:
EweID | DateSampled | UdderHalf | Bacteria1_ID.BC1 | Bacteria1_Level.BC1| Bacteria1_ID.BC2 | Bacteria1_Level.BC2 | Bacteria2_ID.BC1 | Bacteria2_Level.BC1| Bacteria2_ID.BC2 | Bacteria2_Level.BC2 |
Instead, the Bacteria2_Level columns are not being converted to wide format, and these are the following columns I am getting:
EweID | DateSampled | UdderHalf | Bacteria2_Level | Bacteria1_ID.BC1 | Bacteria1_Level.BC1| Bacteria1_ID.BC2 | Bacteria1_Level.BC2 | Bacteria2_ID.BC1 | Bacteria2_ID.BC2 |
Here is the code to recreate the data frame:
Data <- data.frame(matrix(ncol = 8, nrow = 20))
colnames(DataN) <- c('EweID', 'DateSampled', 'Sample', 'UdderHalf', 'Bacteria1_ID', 'Bacteria1_Level', 'Bacteria2_ID', 'Bacteria2_Level')
Data$EweID <- c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10)
Data$DateSampled <- as.Date(c("2021-10-13", "2021-10-13", "2021-10-20", "2021-10-20", "2021-10-27", "2021-10-27", "2021-11-03", "2021-11-03", "2021-11-10", "2021-11-10", "2021-11-17", "2021-11-17", "2021-11-24", "2021-11-24", "2021-12-01", "2021-12-01", "2021-10-13", "2021-10-13", "2021-10-20", "2021-10-20"))
Data$Sample <- c("BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1")
Data$UdderHalf <- c("L","L","L","L","L","L","L","L","L","L","L","L","L","L","L","L","R","R","R", "R")
Data$Bacteria1_ID <- c("No Growth", "Staph Auerus", "Staph Equorum", "Staph Oralis", "No Growth","No Growth", "No Growth", "No Growth", "No Growth",NA, NA, NA, "Staph Sp", "Staph Auerus", "Staph Oralis", NA, NA, NA,"No Growth","No Growth")
Data$Bacteria1_Level <- c(NA, 3, 4, 2, NA, NA, NA, NA, NA, NA, NA, NA, 2, 1, 4, NA, NA, NA,NA,NA)
Data$Bacteria2_ID <- c("No Growth", "Staph Auerus", "Staph Sp", NA, NA, NA, "Staph Aureus", "No Growth", NA, "No Growth", "No Growth", "No Growth", "No Growth", "No Growth", NA, "Staph Sp", "Staph Aureus", NA, NA, NA )
Data$Bacteria2_Level <- c(NA, 4, 1, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA, NA, NA, 2, 1, NA, NA, NA)
I appreciate your help. Thank you.
CodePudding user response:
Seems like all you're missing is "Bacteria2_Level" from the v.names argument.
MDFSO <- reshape(
data=Data,
idvar= c("EweID","DateSampled", "UdderHalf"),
v.names = c("Bacteria1_ID","Bacteria1_Level", "Bacteria2_ID", "Bacteria2_Level"),
timevar = "Sample",
direction="wide"
)
CodePudding user response:
This is easier accomplished using the pivot_wider()
function from the tidyr package.
library(tidyr)
pivot_wider(Data,
id_cols=c(EweID, DateSampled, UdderHalf),
names_from = Sample,
values_from = c(Bacteria1_ID, Bacteria1_Level, Bacteria2_ID, Bacteria2_Level))
EweID DateSampled UdderHalf Bacteria1_ID_BC2 Bacteria1_ID_BC1 Bacteria1_Level_BC2 Bacteria1_Level_BC1 Bacteria2_ID_BC2 Bacteria2_ID_BC1 Bacteria2_Level_BC2 Bacteria2_Level_BC1
<dbl> <date> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
1 1 2021-10-13 L No Growth Staph Auerus NA 3 No Growth Staph Auerus NA 4
2 2 2021-10-20 L Staph Equorum Staph Oralis 4 2 Staph Sp NA 1 NA
3 3 2021-10-27 L No Growth No Growth NA NA NA NA NA NA
4 4 2021-11-03 L No Growth No Growth NA NA Staph Aureus No Growth 2 NA
5 5 2021-11-10 L No Growth NA NA NA NA No Growth NA NA
...
To rearrange the columns:
answer[, c("EweID", "DateSampled", "UdderHalf", "Bacteria1_ID_BC1", "Bacteria1_Level_BC1", "Bacteria1_ID_BC2", "Bacteria1_Level_BC2", "Bacteria2_ID_BC1", "Bacteria2_Level_BC1", "Bacteria2_ID_BC2", "Bacteria2_Level_BC2")]