I would like to transform two columns of data of my data frame to a wider format using two category columns, I found similar topics that used the reshape or melt functions, but was still not able to figure it out how it works when you have two id columns. I have the following data frame:
Testphase <- c("BG15_H_D_X","BG15_H_D_X","BG15_H_D_X","BG15_H_D_X","BG15_H_D_Y","BG15_H_D_Y","BG15_H_D_Y","BG15_H_D_Y")
RPM <- c("100","100","200","200","100","100","200","200")
Angle <- c("15","30","15","30","15","30","15","30")
Data1 <- c(1.17,0.65,0.56,0.49,1.14,0.86,0.53,0.72)
Data2 <- c(0.41,0.57,0.56,0.45,0.46,0.4,0.73,0.51)
df <- data.frame(Testphase, RPM, Angle, Data1, Data2)
And I would like to transfer it to:
RPM <- c("100","100","200","200")
Angle <- c("15","30","15","30")
Data1.X <- c(1.17,0.65,0.56,0.49)
Data1.Y <- c(1.14,0.86,0.53,0.72)
Data2.X <- c(0.41,0.57,0.56,0.45)
Data2.Y <- c(0.46,0.4,0.73,0.51)
df_wide <-data.frame(RPM, Angle, Data1.X, Data1.Y, Data2.X, Data2.Y)
BONUS: just use the last letter of testphase to give name to the new created columns, so Data1.X instead of Data1.BG15_H_D_X.
CodePudding user response:
With pivot_wider
, you can supply a glue specification that uses the names_from
columns (and special .value
) to create custom column names.
library(tidyr)
pivot_wider(df, names_from = Testphase, values_from = c(Data1, Data2),
names_glue = "{.value}.{sub('.*_', '', Testphase)}")
# # A tibble: 4 × 6
# RPM Angle Data1.X Data1.Y Data2.X Data2.Y
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 100 15 1.17 1.14 0.41 0.46
# 2 100 30 0.65 0.86 0.57 0.4
# 3 200 15 0.56 0.53 0.56 0.73
# 4 200 30 0.49 0.72 0.45 0.51
CodePudding user response:
You could use separate
and pivot_longer
(from tidyr
):
library(tidyr)
df |>
separate(Testphase,
sep = 9, # Splitting at 9th position, does take regex!
into = c("Testphase", "name")) |>
pivot_wider(id_cols = c("Testphase", "RPM", "Angle"),
values_from = starts_with("Data")) |>
select(-Testphase)
Output:
# A tibble: 4 × 7
RPM Angle Data1_X Data1_Y Data2_X Data2_Y
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 100 15 1.17 1.14 0.41 0.46
2 100 30 0.65 0.86 0.57 0.4
3 200 15 0.56 0.53 0.56 0.73
4 200 30 0.49 0.72 0.45 0.51
Update: If Testphase
doesn't contain information (earlier I assumed that there would be more rows with other phases), it simplifies to:
library(tidyr)
df |>
mutate(Testphase = sub("BG15_H_D_", "", Testphase)) |>
pivot_wider(names_from = Testphase,
id_cols = c("RPM", "Angle"),
values_from = starts_with("Data"))
Output:
# A tibble: 4 × 6
RPM Angle Data1_X Data1_Y Data2_X Data2_Y
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 100 15 1.17 1.14 0.41 0.46
2 100 30 0.65 0.86 0.57 0.4
3 200 15 0.56 0.53 0.56 0.73
4 200 30 0.49 0.72 0.45 0.51
CodePudding user response:
Thanks to Harre's suggestion for the separate
function, I was in the end also able to do it using the reshape function which includes using the drop
argument to get to the right result.
df_wide <- df %>%
separate(Testphase, sep = 9, into = c("Testphase", "name")) %>%
reshape(idvar = c("Angle","RPM"), timevar = "name", direction = "wide", drop = "Testphase")
CodePudding user response:
Using dcast
library(data.table)
dcast(setDT(df), RPM Angle ~ trimws(Testphase, whitespace = ".*_"),
value.var = c("Data1", "Data2"), sep = ".")
-output
Key: <RPM, Angle>
RPM Angle Data1.X Data1.Y Data2.X Data2.Y
<char> <char> <num> <num> <num> <num>
1: 100 15 1.17 1.14 0.41 0.46
2: 100 30 0.65 0.86 0.57 0.40
3: 200 15 0.56 0.53 0.56 0.73
4: 200 30 0.49 0.72 0.45 0.51