When trying to convert from long to wide format using "reshape" function in R, the functio-CodePudding

For this data, we are looking at infection present in the udders of sheep. Each animal has a unique ID (in the column: EweID), a date sampled, udder half (R or L in this case), a sample type (BC1 or BC2), Bacteria1_ID (the name of identified bacteria in each sample), Bacteria1_Level (how much of the bacteria is present in the animal, numeric 1-4), Bacteria2_ID, and Bacteria2_Level.

The data frame has 20 rows and the following columns:

EweID      DateSampled      Sample          UdderHalf        Bacteria1_ID    Bacteria1_Level    
numeric    date format     char str           char            char. str.       numeric
1               .             BC2               L             No Growth          NA
1               .             BC1               L            Staph Aureus         3
2               .             BC2               L            Staph Equorum        4
2               .             BC1               L            Staph Oralis         2
3               .             BC2               L                NA              NA
3               .             BC1               L                NA              NA
4               .             BC2               R                NA              NA
4               .             BC1               R                NA              NA
5               .             BC2               R                NA              NA
5               .             BC1               R                NA              NA

The text box cuts off Bacteria2_ID and Bacteria2_Level but those columns are the last two columns from the right hand side.

I used the following code to convert from long to wide format. The goal of doing this would be to get the following new columns to replace the current bacteria columns:

Code:

MDFSO <- reshape(data=Data,
                          idvar= c("EweID","DateSampled", "UdderHalf"),          
                          v.names = c("Bacteria1_ID","Bacteria1_Level", "Bacteria2_ID"),
                          timevar = "Sample",
                          direction="wide")

Desired Result Columns:

EweID | DateSampled | UdderHalf | Bacteria1_ID.BC1 | Bacteria1_Level.BC1| Bacteria1_ID.BC2 | Bacteria1_Level.BC2 | Bacteria2_ID.BC1 | Bacteria2_Level.BC1| Bacteria2_ID.BC2 | Bacteria2_Level.BC2 |

Instead, the Bacteria2_Level columns are not being converted to wide format, and these are the following columns I am getting:

EweID | DateSampled | UdderHalf | Bacteria2_Level | Bacteria1_ID.BC1 | Bacteria1_Level.BC1| Bacteria1_ID.BC2 | Bacteria1_Level.BC2 | Bacteria2_ID.BC1 | Bacteria2_ID.BC2 |

Here is the code to recreate the data frame:

Data <- data.frame(matrix(ncol = 8, nrow = 20))
colnames(DataN) <- c('EweID', 'DateSampled', 'Sample', 'UdderHalf', 'Bacteria1_ID', 'Bacteria1_Level', 'Bacteria2_ID', 'Bacteria2_Level')

Data$EweID <- c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10)

Data$DateSampled <- as.Date(c("2021-10-13", "2021-10-13", "2021-10-20", "2021-10-20", "2021-10-27", "2021-10-27", "2021-11-03", "2021-11-03", "2021-11-10", "2021-11-10", "2021-11-17", "2021-11-17", "2021-11-24", "2021-11-24", "2021-12-01", "2021-12-01", "2021-10-13", "2021-10-13", "2021-10-20", "2021-10-20"))

Data$Sample <- c("BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1","BC2", "BC1")

Data$UdderHalf <- c("L","L","L","L","L","L","L","L","L","L","L","L","L","L","L","L","R","R","R", "R")

Data$Bacteria1_ID <- c("No Growth", "Staph Auerus", "Staph Equorum", "Staph Oralis", "No Growth","No Growth", "No Growth", "No Growth", "No Growth",NA, NA, NA, "Staph Sp", "Staph Auerus", "Staph Oralis", NA, NA, NA,"No Growth","No Growth")

Data$Bacteria1_Level <- c(NA, 3, 4, 2, NA, NA, NA, NA, NA, NA, NA, NA, 2, 1, 4, NA, NA, NA,NA,NA)

Data$Bacteria2_ID <- c("No Growth", "Staph Auerus", "Staph Sp", NA, NA, NA, "Staph Aureus", "No Growth", NA, "No Growth", "No Growth", "No Growth", "No Growth", "No Growth", NA, "Staph Sp", "Staph Aureus", NA, NA, NA )

Data$Bacteria2_Level <- c(NA, 4, 1, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA, NA, NA, 2, 1, NA, NA, NA)

I appreciate your help. Thank you.

CodePudding user response：

Seems like all you're missing is "Bacteria2_Level" from the v.names argument.

MDFSO <- reshape(
  data=Data,
  idvar= c("EweID","DateSampled", "UdderHalf"),          
  v.names = c("Bacteria1_ID","Bacteria1_Level", "Bacteria2_ID", "Bacteria2_Level"),
  timevar = "Sample",
  direction="wide"
)

CodePudding user response：

This is easier accomplished using the pivot_wider() function from the tidyr package.

library(tidyr)
pivot_wider(Data, 
            id_cols=c(EweID, DateSampled, UdderHalf),
            names_from = Sample, 
            values_from = c(Bacteria1_ID, Bacteria1_Level, Bacteria2_ID, Bacteria2_Level))


  EweID DateSampled UdderHalf Bacteria1_ID_BC2 Bacteria1_ID_BC1 Bacteria1_Level_BC2 Bacteria1_Level_BC1 Bacteria2_ID_BC2 Bacteria2_ID_BC1 Bacteria2_Level_BC2 Bacteria2_Level_BC1
   <dbl> <date>      <chr>     <chr>            <chr>                          <dbl>               <dbl> <chr>            <chr>                          <dbl>               <dbl>
 1     1 2021-10-13  L         No Growth        Staph Auerus                      NA                   3 No Growth        Staph Auerus                      NA                   4
 2     2 2021-10-20  L         Staph Equorum    Staph Oralis                       4                   2 Staph Sp         NA                                 1                  NA
 3     3 2021-10-27  L         No Growth        No Growth                         NA                  NA NA               NA                                NA                  NA
 4     4 2021-11-03  L         No Growth        No Growth                         NA                  NA Staph Aureus     No Growth                          2                  NA
 5     5 2021-11-10  L         No Growth        NA                                NA                  NA NA               No Growth                         NA                  NA
...

To rearrange the columns:

answer[, c("EweID", "DateSampled", "UdderHalf", "Bacteria1_ID_BC1", "Bacteria1_Level_BC1", "Bacteria1_ID_BC2",  "Bacteria1_Level_BC2", "Bacteria2_ID_BC1", "Bacteria2_Level_BC1", "Bacteria2_ID_BC2", "Bacteria2_Level_BC2")]