Home > Software engineering >  Long to wide with multiple columns
Long to wide with multiple columns

Time:11-25

I have a data frame with the columns sampleID, method, parameter and value.

set.seed(123)

mydata <- data.frame(sample_ID = rep(1:100, each=4),
                     method = rep(LETTERS[1:2], 100),
                     parameter = rep(c("M1","M2"),times=c(2,2)),
                     value = round(runif(100, min = 100, max = 5000)), 
                     stringsAsFactors = FALSE)

This data frame is organized in long format and I would like to convert it to wide format like this: The sample_ID should be the identifier of the row - now the columns method parameter should be combined with the corresponding value, f.e.

Sample_ID 1 has the value

  • 1509 for method A and parameter M1
  • 3963 for method B and parameter M1
  • 2104 for method A and parameter M2
  • 4427 for method B and parameter M2

Now I would like to convert these 4 rows to a single row like this:

sample_ID = 1, A_M1 = 1509, B_M1 = 3963, A_M2 = 2104, B_M2 = 4427

The next row would be consist of those variables with sample_ID = 2, ...

I'm sorry but I was not able to do this with spread() or melt().

Thank you in advance!

CodePudding user response:

Making use of tidyr::pivot_wider you could do:

tidyr::pivot_wider(mydata, names_from = c("method", "parameter"), values_from = value)
#> # A tibble: 100 × 5
#>    sample_ID  A_M1  B_M1  A_M2  B_M2
#>        <int> <dbl> <dbl> <dbl> <dbl>
#>  1         1  1509  3963  2104  4427
#>  2         2  4708   323  2688  4473
#>  3         3  2802  2337  4788  2321
#>  4         4  3420  2906   604  4509
#>  5         5  1306   306  1707  4777
#>  6         6  4459  3495  3238  4972
#>  7         7  3313  3572  2766  3011
#>  8         8  1517   821  4819  4521
#>  9         9  3484  3998   221  2441
#> 10        10  3816  1160  1659  1235
#> # … with 90 more rows

CodePudding user response:

Using dcast() from data.table package

dcast(mydata, sample_ID~...)

If you convert your data.frame to a data.table first using setDT() you can express this using the proper data.table notation

mydata[, dcast(.SD, sample_ID~...)]
  •  Tags:  
  • r
  • Related