Home > Net >  Transform dataframe to wide format based on two columns of categories and two columns with data
Transform dataframe to wide format based on two columns of categories and two columns with data

Time:09-03

I would like to transform two columns of data of my data frame to a wider format using two category columns, I found similar topics that used the reshape or melt functions, but was still not able to figure it out how it works when you have two id columns. I have the following data frame:

Testphase <- c("BG15_H_D_X","BG15_H_D_X","BG15_H_D_X","BG15_H_D_X","BG15_H_D_Y","BG15_H_D_Y","BG15_H_D_Y","BG15_H_D_Y")
    RPM <- c("100","100","200","200","100","100","200","200")
    Angle <- c("15","30","15","30","15","30","15","30")
    Data1 <- c(1.17,0.65,0.56,0.49,1.14,0.86,0.53,0.72)
    Data2 <- c(0.41,0.57,0.56,0.45,0.46,0.4,0.73,0.51)
    df <- data.frame(Testphase, RPM, Angle, Data1, Data2)

And I would like to transfer it to:

RPM <- c("100","100","200","200")
    Angle <- c("15","30","15","30")
    Data1.X <- c(1.17,0.65,0.56,0.49)
    Data1.Y <- c(1.14,0.86,0.53,0.72)
    Data2.X <- c(0.41,0.57,0.56,0.45)
    Data2.Y <- c(0.46,0.4,0.73,0.51)
    df_wide <-data.frame(RPM, Angle, Data1.X, Data1.Y, Data2.X, Data2.Y)

BONUS: just use the last letter of testphase to give name to the new created columns, so Data1.X instead of Data1.BG15_H_D_X.

CodePudding user response:

With pivot_wider, you can supply a glue specification that uses the names_from columns (and special .value) to create custom column names.

library(tidyr)

pivot_wider(df, names_from = Testphase, values_from = c(Data1, Data2),
            names_glue = "{.value}.{sub('.*_', '', Testphase)}")

# # A tibble: 4 × 6
#   RPM   Angle Data1.X Data1.Y Data2.X Data2.Y
#   <chr> <chr>   <dbl>   <dbl>   <dbl>   <dbl>
# 1 100   15       1.17    1.14    0.41    0.46
# 2 100   30       0.65    0.86    0.57    0.4 
# 3 200   15       0.56    0.53    0.56    0.73
# 4 200   30       0.49    0.72    0.45    0.51

CodePudding user response:

You could use separate and pivot_longer (from tidyr):

library(tidyr)

df |>
  separate(Testphase,
           sep = 9, # Splitting at 9th position, does take regex!
           into = c("Testphase", "name")) |>
  pivot_wider(id_cols = c("Testphase", "RPM", "Angle"),
              values_from = starts_with("Data")) |>
  select(-Testphase)

Output:

# A tibble: 4 × 7
  RPM   Angle Data1_X Data1_Y Data2_X Data2_Y
  <chr> <chr>   <dbl>   <dbl>   <dbl>   <dbl>
1 100   15       1.17    1.14    0.41    0.46
2 100   30       0.65    0.86    0.57    0.4 
3 200   15       0.56    0.53    0.56    0.73
4 200   30       0.49    0.72    0.45    0.51

Update: If Testphase doesn't contain information (earlier I assumed that there would be more rows with other phases), it simplifies to:

library(tidyr)

df |>
  mutate(Testphase = sub("BG15_H_D_", "", Testphase)) |>
  pivot_wider(names_from = Testphase,
              id_cols = c("RPM", "Angle"),
              values_from = starts_with("Data"))

Output:

# A tibble: 4 × 6
  RPM   Angle Data1_X Data1_Y Data2_X Data2_Y
  <chr> <chr>   <dbl>   <dbl>   <dbl>   <dbl>
1 100   15       1.17    1.14    0.41    0.46
2 100   30       0.65    0.86    0.57    0.4 
3 200   15       0.56    0.53    0.56    0.73
4 200   30       0.49    0.72    0.45    0.51

CodePudding user response:

Thanks to Harre's suggestion for the separate function, I was in the end also able to do it using the reshape function which includes using the drop argument to get to the right result.

df_wide <- df %>% 
  separate(Testphase, sep = 9, into = c("Testphase", "name")) %>%
  reshape(idvar = c("Angle","RPM"), timevar = "name", direction = "wide", drop = "Testphase")

CodePudding user response:

Using dcast

library(data.table)
 dcast(setDT(df), RPM   Angle ~ trimws(Testphase, whitespace = ".*_"),
      value.var = c("Data1", "Data2"), sep = ".")

-output

Key: <RPM, Angle>
      RPM  Angle Data1.X Data1.Y Data2.X Data2.Y
   <char> <char>   <num>   <num>   <num>   <num>
1:    100     15    1.17    1.14    0.41    0.46
2:    100     30    0.65    0.86    0.57    0.40
3:    200     15    0.56    0.53    0.56    0.73
4:    200     30    0.49    0.72    0.45    0.51
  • Related