Home > Enterprise >  "number of items to replace is not a multiple of replacement length" when reshaping from s
"number of items to replace is not a multiple of replacement length" when reshaping from s

Time:03-11

I would like to transform a semi-long dataframe to long format. However, after the reshape command, there are several warnigns saying "number of items to replace is not a multiple of replacement length". When I open the new dataframe, the format is basically correct but it says the dataframe is corrupt.

What is going on?

This is the command I use. It explicitly asks me to insert a value for idvar:

df2 = reshape(df,
              direction="long",
              varying=3:ncol(df),
              ids="id",
              idvar="newid",
              timevar="category")

This is the structure of my original dataframe (actually, there are not only cars and trees, but many more categories):

id  trial  resp.car rt.car color.car resp.tree rt.tree color.tree
 1      1         1    500    "blue"         3     765    "green"
 1      1         3    534   "green"         1     455   "yellow"
 1      2         2    553  "yellow"         2     794      "red"
 1      2         3    577   "black"         3     834     "blue"
 2      1         1    598   "green"         1     756      "red"
 2      1         3    355  "yellow"         3     457    "black"
 2      2         3    876    "blue"         1     767   "yellow"
 2      2         2    466   "black"         1     439    "green"

Desired result:

id  trial  category   resp        rt     color
 1      1     "car"      1       500    "blue"    
 1      1     "car"      3       534   "green"  
 1      2     "car"      2       553  "yellow"     
 1      2     "car"      3       577   "black"    
 1      1    "tree"      3       765   "green"     
 1      1    "tree"      1       455  "yellow"    
 1      2    "tree"      2       794     "red"     
 1      2    "tree"      3       834    "blue"     
 2      1     "car"      1       598   "green"
 ...

CodePudding user response:

It may be easier with pivot_longer - specify the columns to be reshaped to long in cols, capture the substring of column names in names_pattern and the column names in names_to. The .value will return the values of the columns where as category will be the column name of the extracted substring suffix from the column name. The regex pattern matches one or more characters (.*) from the start (^) of the column names, capture ((..)) followed by a dot (\\. - escape as it is a metacharacter which matches any character) followed by the second capture group ((.*)) to match all other characters that follows

library(tidyr)
pivot_longer(df, cols = -c(id, trial), 
  names_to = c(".value", "category"), names_pattern = "^(.*)\\.(.*)")

-output

# A tibble: 16 × 6
      id trial category  resp    rt color 
   <int> <int> <chr>    <int> <int> <chr> 
 1     1     1 car          1   500 blue  
 2     1     1 tree         3   765 green 
 3     1     1 car          3   534 green 
 4     1     1 tree         1   455 yellow
 5     1     2 car          2   553 yellow
 6     1     2 tree         2   794 red   
 7     1     2 car          3   577 black 
 8     1     2 tree         3   834 blue  
 9     2     1 car          1   598 green 
10     2     1 tree         1   756 red   
11     2     1 car          3   355 yellow
12     2     1 tree         3   457 black 
13     2     2 car          3   876 blue  
14     2     2 tree         1   767 yellow
15     2     2 car          2   466 black 
16     2     2 tree         1   439 green 

With reshape, we may have to pass the varying as a list of unique columns grouped together along with unique index of 'idvar'

out <- reshape(transform(df, idnew = seq_along(id)), 
 idvar = "idnew", varying = list(c(3, 6), c(4,7), c(5,8)), direction="long",
         v.names = c('resp','rt', "color"), timevar = "category")

row.names(out) <- NULL
out
   id trial idnew category resp  rt  color
1   1     1     1        1    1 500   blue
2   1     1     2        1    3 534  green
3   1     2     3        1    2 553 yellow
4   1     2     4        1    3 577  black
5   2     1     5        1    1 598  green
6   2     1     6        1    3 355 yellow
7   2     2     7        1    3 876   blue
8   2     2     8        1    2 466  black
9   1     1     1        2    3 765  green
10  1     1     2        2    1 455 yellow
11  1     2     3        2    2 794    red
12  1     2     4        2    3 834   blue
13  2     1     5        2    1 756    red
14  2     1     6        2    3 457  black
15  2     2     7        2    1 767 yellow
16  2     2     8        2    1 439  green

data

structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), trial = c(1L, 
1L, 2L, 2L, 1L, 1L, 2L, 2L), resp.car = c(1L, 3L, 2L, 3L, 1L, 
3L, 3L, 2L), rt.car = c(500L, 534L, 553L, 577L, 598L, 355L, 876L, 
466L), color.car = c("blue", "green", "yellow", "black", "green", 
"yellow", "blue", "black"), resp.tree = c(3L, 1L, 2L, 3L, 1L, 
3L, 1L, 1L), rt.tree = c(765L, 455L, 794L, 834L, 756L, 457L, 
767L, 439L), color.tree = c("green", "yellow", "red", "blue", 
"red", "black", "yellow", "green")), class = "data.frame", row.names = c(NA, 
-8L))
  • Related