Home > Software engineering >  Transformation from long to wide with multiple observations in R
Transformation from long to wide with multiple observations in R

Time:12-17

I want to transform a data set from long to wide. The data contains multiple observations for each time point.

To illustrate, consider the following two examples.

In EXAMPLE 1 below, the data does not contain multiple observations and can be transformed from long to wide.

In EXAMPLE 2 below, the data does contain multiple observations (n=3 per time point) and cannot be transformed from long to wide, testing with dcast and pivot_wider.

Can anyone suggest a method to transform the test data from EXAMPLE 2 into a valid format?

Code to reproduce the problem:

library(ggplot2)
library(ggcorrplot)
library(reshape2)
library(tidyr)
library(data.table)

# EXAMPLE 1 (does work)
# Test data
set.seed(5)
time    <- rep(c(0,10), 1, each = 2)
feature <- rep(c("feat1", "feat2"), 2)
values  <- runif(4, min=0, max=1)

# Concatenate test data
# test has non-unique values in time column
test    <- data.table(time, feature, values)

# Transform data into wide format
test_wide <- dcast(test, time ~ feature, value.var = 'values')

# EXAMPLE 2 (does not work)
# Test data
set.seed(5)
time    <- rep(c(0,10), 2, each = 6)
feature <- c(rep("feat1", 12), rep("feat2", 12))
values  <- runif(24, min=0, max=1)

# Concatenate test data
# test has non-unique values in time column
test    <- data.table(time, feature, values)

# Transform data into wide format
test_wide <- dcast(test, time ~ feature, value.var = 'values')

Warning:

Aggregate function missing, defaulting to 'length'

Problem:

Non-unique values in first column (time) are not preserved/allowed.

# Testing with pivot_wider
test_wider <- pivot_wider(test, names_from = feature, values_from = values)

Warning:

Warning message:
Values are not uniquely identified; output will contain list-cols.

Problem:

Non-unique values in first column (time) are not preserved/allowed.

In lack of a better idea, a possible output could look like this:

time feat1 feat2
0 0.1046501 0.5279600
0 0.7010575 0.8079352
0 0.2002145 0.9565001

etc.

CodePudding user response:

Since there are multiple values, it is not obvious how these should be treated when converting to a wide format. That's why you get the warning messages. This is one way of handling them. If you want something else, then please give a specific example of what the output should look like.

pivot_wider(test, names_from = feature, values_from = values) %>% 
    unnest(c(feat1, feat2))

CodePudding user response:

You may want something like this:

library(dplyr)
test %>% 
  pivot_wider(names_from = c(feature, time), 
              values_from = values)

where the c(feature, times) accounts for the multiple variable case. But as was already pointed out in the comments please indicate if you want something else.

  • Related