I would like to transform a semi-long dataframe to long format. However, after the reshape command, there are several warnigns saying "number of items to replace is not a multiple of replacement length". When I open the new dataframe, the format is basically correct but it says the dataframe is corrupt.
What is going on?
This is the command I use. It explicitly asks me to insert a value for idvar:
df2 = reshape(df,
direction="long",
varying=3:ncol(df),
ids="id",
idvar="newid",
timevar="category")
This is the structure of my original dataframe (actually, there are not only cars and trees, but many more categories):
id trial resp.car rt.car color.car resp.tree rt.tree color.tree
1 1 1 500 "blue" 3 765 "green"
1 1 3 534 "green" 1 455 "yellow"
1 2 2 553 "yellow" 2 794 "red"
1 2 3 577 "black" 3 834 "blue"
2 1 1 598 "green" 1 756 "red"
2 1 3 355 "yellow" 3 457 "black"
2 2 3 876 "blue" 1 767 "yellow"
2 2 2 466 "black" 1 439 "green"
Desired result:
id trial category resp rt color
1 1 "car" 1 500 "blue"
1 1 "car" 3 534 "green"
1 2 "car" 2 553 "yellow"
1 2 "car" 3 577 "black"
1 1 "tree" 3 765 "green"
1 1 "tree" 1 455 "yellow"
1 2 "tree" 2 794 "red"
1 2 "tree" 3 834 "blue"
2 1 "car" 1 598 "green"
...
CodePudding user response:
It may be easier with pivot_longer
- specify the columns to be reshaped to long in cols
, capture the substring of column names in names_pattern
and the column names in names_to
. The .value
will return the values of the columns where as category
will be the column name of the extracted substring suffix from the column name. The regex pattern matches one or more characters (.*
) from the start (^
) of the column names, capture ((..)
) followed by a dot (\\.
- escape as it is a metacharacter which matches any character) followed by the second capture group ((.*)
) to match all other characters that follows
library(tidyr)
pivot_longer(df, cols = -c(id, trial),
names_to = c(".value", "category"), names_pattern = "^(.*)\\.(.*)")
-output
# A tibble: 16 × 6
id trial category resp rt color
<int> <int> <chr> <int> <int> <chr>
1 1 1 car 1 500 blue
2 1 1 tree 3 765 green
3 1 1 car 3 534 green
4 1 1 tree 1 455 yellow
5 1 2 car 2 553 yellow
6 1 2 tree 2 794 red
7 1 2 car 3 577 black
8 1 2 tree 3 834 blue
9 2 1 car 1 598 green
10 2 1 tree 1 756 red
11 2 1 car 3 355 yellow
12 2 1 tree 3 457 black
13 2 2 car 3 876 blue
14 2 2 tree 1 767 yellow
15 2 2 car 2 466 black
16 2 2 tree 1 439 green
With reshape
, we may have to pass the varying
as a list
of unique columns grouped together along with unique index of 'idvar'
out <- reshape(transform(df, idnew = seq_along(id)),
idvar = "idnew", varying = list(c(3, 6), c(4,7), c(5,8)), direction="long",
v.names = c('resp','rt', "color"), timevar = "category")
row.names(out) <- NULL
out
id trial idnew category resp rt color
1 1 1 1 1 1 500 blue
2 1 1 2 1 3 534 green
3 1 2 3 1 2 553 yellow
4 1 2 4 1 3 577 black
5 2 1 5 1 1 598 green
6 2 1 6 1 3 355 yellow
7 2 2 7 1 3 876 blue
8 2 2 8 1 2 466 black
9 1 1 1 2 3 765 green
10 1 1 2 2 1 455 yellow
11 1 2 3 2 2 794 red
12 1 2 4 2 3 834 blue
13 2 1 5 2 1 756 red
14 2 1 6 2 3 457 black
15 2 2 7 2 1 767 yellow
16 2 2 8 2 1 439 green
data
structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), trial = c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), resp.car = c(1L, 3L, 2L, 3L, 1L,
3L, 3L, 2L), rt.car = c(500L, 534L, 553L, 577L, 598L, 355L, 876L,
466L), color.car = c("blue", "green", "yellow", "black", "green",
"yellow", "blue", "black"), resp.tree = c(3L, 1L, 2L, 3L, 1L,
3L, 1L, 1L), rt.tree = c(765L, 455L, 794L, 834L, 756L, 457L,
767L, 439L), color.tree = c("green", "yellow", "red", "blue",
"red", "black", "yellow", "green")), class = "data.frame", row.names = c(NA,
-8L))