my example dataset is wide and contains these values:
olddata_wide <- read.table(header=TRUE, text='
subject sex a b c a1 b1 c1 a2 b2 c2
1 M 7.9 12.3 10.7 7.5 12.1 10.3 8.1 12.5 10.9
2 F 6.3 10.6 11.1 6.0 10.4 11.0 6.5 10.9 11.4
3 F 9.5 13.1 13.8 9.3 13.0 13.5 9.8 13.5 13.9
4 M 11.5 13.4 12.9 11.2 13.5 12.7 11.7 13.6 13.9
')
I would like to convert it to a long dataset. The problem is that I want to use multiple keycols at the same time - I want the columns a
& b
& c
to become one long column called value
, as well as columns a1
& b1
& c1
to value1
and a2
& b2
& c2
to value3
. so the desired outcome is:
subject sex value valueType value1 valueType1 value2 valueType2
1: 1 M 7.9 a 7.5 a1 8.1 a2
2: 2 F 6.3 a 6.0 a1 6.5 a2
3: 3 F 9.5 a 9.3 a1 9.8 a2
4: 4 M 11.5 a 11.2 a1 11.7 a2
5: 1 M 12.3 b 12.1 b1 12.5 b2
6: 2 F 10.6 b 10.4 b1 10.9 b2
7: 3 F 13.1 b 13.0 b1 13.5 b2
8: 4 M 13.4 b 13.5 b1 13.6 b2
9: 1 M 10.7 c 10.3 c1 10.9 c2
10: 2 F 11.1 c 11.0 c1 11.4 c2
11: 3 F 13.8 c 13.5 c1 13.9 c2
12: 4 M 12.9 c 12.7 c1 13.9 c2
I know how to get the desired outcome programatically for one key column:
keycol <- "valueType"
valuecol <- "value"
gathercols <- c("a", "b", "c")
gather_(olddata_wide, keycol, valuecol, gathercols)
but how can I do this for multiple keycols at the same time?
CodePudding user response:
Here is one approach using melt()
and dcast()
from the data.table
package.
setDT(olddata_wide)
setnames(olddata_wide, old=c("a","b","c"), new=c("a0", "b0", "c0"))
df_long = dcast(
melt(olddata_wide,c("subject","sex"),variable.name = "v")[,(c("v", "t")):=tstrsplit(v,"")],
subject sex v~t, value.var="value",
)
df_long[, .(subject, sex, value=`0`,valueType=v, value1=`1`,valueType1 = paste0(v,"1"), value2=`2`,valueType2 = paste0(v,"2"))]
Output:
subject sex value valueType value1 valueType1 value2 valueType2
1: 1 M 7.9 a 7.5 a1 8.1 a2
2: 1 M 12.3 b 12.1 b1 12.5 b2
3: 1 M 10.7 c 10.3 c1 10.9 c2
4: 2 F 6.3 a 6.0 a1 6.5 a2
5: 2 F 10.6 b 10.4 b1 10.9 b2
6: 2 F 11.1 c 11.0 c1 11.4 c2
7: 3 F 9.5 a 9.3 a1 9.8 a2
8: 3 F 13.1 b 13.0 b1 13.5 b2
9: 3 F 13.8 c 13.5 c1 13.9 c2
10: 4 M 11.5 a 11.2 a1 11.7 a2
11: 4 M 13.4 b 13.5 b1 13.6 b2
12: 4 M 12.9 c 12.7 c1 13.9 c2
CodePudding user response:
Here is a (fairly clunky) tidyverse
approach:
olddata_wide %>%
pivot_longer(matches("^[abc]"), names_to = "valueType") %>%
mutate(suffix = str_remove(valueType, "^.")) %>%
pivot_wider(
names_from = "suffix", values_from = c("value", "valueType"), names_sep = "", values_fn = list) %>%
unnest(matches("value"))
## A tibble: 12 × 8
# subject sex value value1 value2 valueType valueType1 valueType2
# <int> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr>
# 1 1 M 7.9 7.5 8.1 a a1 a2
# 2 1 M 12.3 12.1 12.5 b b1 b2
# 3 1 M 10.7 10.3 10.9 c c1 c2
# 4 2 F 6.3 6 6.5 a a1 a2
# 5 2 F 10.6 10.4 10.9 b b1 b2
# 6 2 F 11.1 11 11.4 c c1 c2
# 7 3 F 9.5 9.3 9.8 a a1 a2
# 8 3 F 13.1 13 13.5 b b1 b2
# 9 3 F 13.8 13.5 13.9 c c1 c2
#10 4 M 11.5 11.2 11.7 a a1 a2
#11 4 M 13.4 13.5 13.6 b b1 b2
#12 4 M 12.9 12.7 13.9 c c1 c2
The general idea is to reshape all columns matching "^[abc]" from wide to long, and then rebuild into a wide format according to your expected output.
CodePudding user response:
There are different solutions.
If your columns have names that don't follow a certain pattern, then I'd go with the sjmisc package.
sjmisc::reshape_longer(
olddata_wide ,
columns = list(
c("a", "b", "c"),
c("a1", "b1", "c1"),
c("a2", "b2", "c2")),
values.to = c("value", "value1", "value2"))
You could also use data.table.
melt(setDT(olddata_wide),
measure = patterns("^[^0-9]$","[abc] 1", "[abc] 2"),
variable.name = c("id"),
value.name = c("value","value1", "value2"))