Transform dataframe to wide format based on two columns of categories and two columns with data-CodePudding

I would like to transform two columns of data of my data frame to a wider format using two category columns, I found similar topics that used the reshape or melt functions, but was still not able to figure it out how it works when you have two id columns. I have the following data frame:

Testphase <- c("BG15_H_D_X","BG15_H_D_X","BG15_H_D_X","BG15_H_D_X","BG15_H_D_Y","BG15_H_D_Y","BG15_H_D_Y","BG15_H_D_Y")
    RPM <- c("100","100","200","200","100","100","200","200")
    Angle <- c("15","30","15","30","15","30","15","30")
    Data1 <- c(1.17,0.65,0.56,0.49,1.14,0.86,0.53,0.72)
    Data2 <- c(0.41,0.57,0.56,0.45,0.46,0.4,0.73,0.51)
    df <- data.frame(Testphase, RPM, Angle, Data1, Data2)

And I would like to transfer it to:

RPM <- c("100","100","200","200")
    Angle <- c("15","30","15","30")
    Data1.X <- c(1.17,0.65,0.56,0.49)
    Data1.Y <- c(1.14,0.86,0.53,0.72)
    Data2.X <- c(0.41,0.57,0.56,0.45)
    Data2.Y <- c(0.46,0.4,0.73,0.51)
    df_wide <-data.frame(RPM, Angle, Data1.X, Data1.Y, Data2.X, Data2.Y)

BONUS: just use the last letter of testphase to give name to the new created columns, so Data1.X instead of Data1.BG15_H_D_X.

CodePudding user response：

With pivot_wider, you can supply a glue specification that uses the names_from columns (and special .value) to create custom column names.

library(tidyr)

pivot_wider(df, names_from = Testphase, values_from = c(Data1, Data2),
            names_glue = "{.value}.{sub('.*_', '', Testphase)}")

# # A tibble: 4 × 6
#   RPM   Angle Data1.X Data1.Y Data2.X Data2.Y
#   <chr> <chr>   <dbl>   <dbl>   <dbl>   <dbl>
# 1 100   15       1.17    1.14    0.41    0.46
# 2 100   30       0.65    0.86    0.57    0.4 
# 3 200   15       0.56    0.53    0.56    0.73
# 4 200   30       0.49    0.72    0.45    0.51

CodePudding user response：

You could use separate and pivot_longer (from tidyr):

library(tidyr)

df |>
  separate(Testphase,
           sep = 9, # Splitting at 9th position, does take regex!
           into = c("Testphase", "name")) |>
  pivot_wider(id_cols = c("Testphase", "RPM", "Angle"),
              values_from = starts_with("Data")) |>
  select(-Testphase)

Output:

# A tibble: 4 × 7
  RPM   Angle Data1_X Data1_Y Data2_X Data2_Y
  <chr> <chr>   <dbl>   <dbl>   <dbl>   <dbl>
1 100   15       1.17    1.14    0.41    0.46
2 100   30       0.65    0.86    0.57    0.4 
3 200   15       0.56    0.53    0.56    0.73
4 200   30       0.49    0.72    0.45    0.51

Update: If Testphase doesn't contain information (earlier I assumed that there would be more rows with other phases), it simplifies to:

library(tidyr)

df |>
  mutate(Testphase = sub("BG15_H_D_", "", Testphase)) |>
  pivot_wider(names_from = Testphase,
              id_cols = c("RPM", "Angle"),
              values_from = starts_with("Data"))

Output:

# A tibble: 4 × 6
  RPM   Angle Data1_X Data1_Y Data2_X Data2_Y
  <chr> <chr>   <dbl>   <dbl>   <dbl>   <dbl>
1 100   15       1.17    1.14    0.41    0.46
2 100   30       0.65    0.86    0.57    0.4 
3 200   15       0.56    0.53    0.56    0.73
4 200   30       0.49    0.72    0.45    0.51

CodePudding user response：

Thanks to Harre's suggestion for the separate function, I was in the end also able to do it using the reshape function which includes using the drop argument to get to the right result.

df_wide <- df %>% 
  separate(Testphase, sep = 9, into = c("Testphase", "name")) %>%
  reshape(idvar = c("Angle","RPM"), timevar = "name", direction = "wide", drop = "Testphase")

CodePudding user response：

Using dcast

library(data.table)
 dcast(setDT(df), RPM   Angle ~ trimws(Testphase, whitespace = ".*_"),
      value.var = c("Data1", "Data2"), sep = ".")

-output

Key: <RPM, Angle>
      RPM  Angle Data1.X Data1.Y Data2.X Data2.Y
   <char> <char>   <num>   <num>   <num>   <num>
1:    100     15    1.17    1.14    0.41    0.46
2:    100     30    0.65    0.86    0.57    0.40
3:    200     15    0.56    0.53    0.56    0.73
4:    200     30    0.49    0.72    0.45    0.51