Home > Net >  Transpose multiple variable data by group ini R
Transpose multiple variable data by group ini R

Time:12-04

I have these data:

 db <- read.table(header=T, text="
ID site R S data
1 1 2 10 01/02/2021
1 1 3 20 03/02/2021
1 2 4 50 05/01/2021
2 1 7 40 02/02/2021
2 2 2 30 05/02/2021
2 2 5 60 06/02/2021
2 2 9 10 07/02/2021
3 1 2 20 02/02/2021
3 2 6 30 03/02/2021
4 1 4 40 05/02/2021
5 1 5 20 07/02/2021")

And I want to get the following result:

db_transpose <- read.table(header=T, text="
ID site R S data R_1 S_1 data_1 R_2 S_2 data_2
1 1 2 10 01/02/2021 3 20 03/02/2021 NA NA NA
1 2 4 50 05/01/2021 NA NA NA NA NA NA
2 1 7 40 02/02/2021 NA NA NA NA NA NA
2 2 2 30 05/02/2021 5 60 06/02/2021 9 10 07/02/2021
3 1 2 20 02/02/2021 NA NA NA NA NA NA
3 2 6 30 03/02/2021 NA NA NA NA NA NA
4 1 4 40 05/02/2021 NA NA NA NA NA NA
5 1 5 20 07/02/2021 NA NA NA NA NA NA")

For every combination of columns ID and site I would like to transpose data in column R, S and data by data order.

I've tried with reshape2 without result. Here've tried with S and R variables:

  require(reshape2)
  dcast(db, ID   site ~ data, value.var=c ("S", "R"))

But I obtain this error message:

Error in .subset2(x, i, exact = exact) : index out of bounds
Warning message:
In if (!(value.var %in% names(data))) { :
  the condition has length > 1 and only the first element will be used

Here I've tried only with one variable:

dcast(db, ID   site ~ data, value.var="S")
But I obtain a result totally different from what I need:
    ID site 01/02/2021 02/02/2021 03/02/2021 05/01/2021 05/02/2021 06/02/2021 07/02/2021
1  1 2         NA         NA         NA         50         NA         NA         NA
2  1 1         10         NA         20         NA         NA         NA         NA
3  2 1         NA         40         NA         NA         NA         NA         NA
4  2 2         NA         NA         NA         NA         30         60         10
5  3 1         NA         20         NA         NA         NA         NA         NA
6  3 2         NA         NA         30         NA         NA         NA         NA
7  4 1         NA         NA         NA         NA         40         NA         NA
8  5 1         NA         NA         NA         NA         NA         NA         20

Thank you

CodePudding user response:

in Base R >= 4:

transform(db, time = ave(ID, ID, site, FUN = seq_along)) |>
  reshape(dir = 'wide', idvar = c('ID', 'site'), sep = '_')

   ID site R_1 S_1     data_1 R_2 S_2     data_2 R_3 S_3     data_3
1   1 1234   2  10 01/02/2021   3  20 03/02/2021  NA  NA       <NA>
3   1 1224   4  50 05/01/2021  NA  NA       <NA>  NA  NA       <NA>
4   2 1234   7  40 02/02/2021  NA  NA       <NA>  NA  NA       <NA>
5   2 1342   2  30 05/02/2021   5  60 06/02/2021   9  10 07/02/2021
8   3 1234   2  20 02/02/2021  NA  NA       <NA>  NA  NA       <NA>
9   3 3421   6  30 03/02/2021  NA  NA       <NA>  NA  NA       <NA>
10  4 1234   4  40 05/02/2021  NA  NA       <NA>  NA  NA       <NA>
11  5 1234   5  20 07/02/2021  NA  NA       <NA>  NA  NA       <NA>

In base R < 4 do:

  reshape(transform(db, time = ave(ID, ID, site, FUN = seq_along)),
          dir = 'wide', idvar = c('ID', 'site'), sep = '_')

in tidyverse:

library(tidyverse)
db %>%
  group_by(ID, site) %>%
  mutate(name = row_number())%>%
  pivot_wider(c(ID, site), values_from = c(R,S,data), names_sep = '_')

# A tibble: 8 x 11
# Groups:   ID, site [8]
     ID  site   R_1   R_2   R_3   S_1   S_2   S_3 data_1     data_2     data_3    
  <int> <int> <int> <int> <int> <int> <int> <int> <chr>      <chr>      <chr>     
1     1  1234     2     3    NA    10    20    NA 01/02/2021 03/02/2021 NA        
2     1  1224     4    NA    NA    50    NA    NA 05/01/2021 NA         NA        
3     2  1234     7    NA    NA    40    NA    NA 02/02/2021 NA         NA        
4     2  1342     2     5     9    30    60    10 05/02/2021 06/02/2021 07/02/2021
5     3  1234     2    NA    NA    20    NA    NA 02/02/2021 NA         NA        
6     3  3421     6    NA    NA    30    NA    NA 03/02/2021 NA         NA        
7     4  1234     4    NA    NA    40    NA    NA 05/02/2021 NA         NA        
8     5  1234     5    NA    NA    20    NA    NA 07/02/2021 NA         NA    

with data.table:

library(data.table)
dcast(setDT(db), ID   site ~ rowid(ID, site), value.var = c('R', 'S', 'data'),sep = '_')
   ID site R_1 R_2 R_3 S_1 S_2 S_3     data_1     data_2     data_3
1:  1 1224   4  NA  NA  50  NA  NA 05/01/2021       <NA>       <NA>
2:  1 1234   2   3  NA  10  20  NA 01/02/2021 03/02/2021       <NA>
3:  2 1234   7  NA  NA  40  NA  NA 02/02/2021       <NA>       <NA>
4:  2 1342   2   5   9  30  60  10 05/02/2021 06/02/2021 07/02/2021
5:  3 1234   2  NA  NA  20  NA  NA 02/02/2021       <NA>       <NA>
6:  3 3421   6  NA  NA  30  NA  NA 03/02/2021       <NA>       <NA>
7:  4 1234   4  NA  NA  40  NA  NA 05/02/2021       <NA>       <NA>
8:  5 1234   5  NA  NA  20  NA  NA 07/02/2021       <NA>       <NA>
  • Related