I have a dataframe which looks like this (but >10k rows).
df <- data.frame(id=1:2,
feature1=1:2,
feature2=4:5,
feature3=7:8)
And want to reshape it looks something like this:
df_goal <- data.frame(value1=c(1,2,1,2,1,2),
value2=c(1,2,4,5,7,8),
x=c("feature1","feature1","feature1","feature1","feature1","feature1"),
y=c("feature1","feature1","feature2","feature2","feature3","feature3"))
df_goal
presents only a subset of the final df that I hope to achieve. Ultimatly, I hope to have every combination of feature1
, feature2
and feature3
and the corresponding values.
I want to use the resulting df for ggplot.
I think there should be a straightforward tidyr
solution for my problem, but I cant seem to find it.
Any help is much appreciated!
CodePudding user response:
Here's a starting point: pivot
the data set, then compute the combn
ations (unnest
and asplit
are just here to make it a dataframe in the end).
library(tidyr)
library(dplyr) #requires dplyr 1.1.0
data1 <-
pivot_longer(df, -id, names_to = "feature") %>%
reframe(across(c(value, feature), ~ asplit(t(combn(.x, 2)), 1)),
.by = id) %>%
unnest_wider(c(value, feature), names_sep = "")
# A tibble: 6 × 5
# id value1 value2 feature1 feature2
#1 1 1 4 feature1 feature2
#2 1 1 7 feature1 feature3
#3 1 4 7 feature2 feature3
#4 2 2 5 feature1 feature2
#5 2 2 8 feature1 feature3
#6 2 5 8 feature2 feature3
To get the remaining rows, this could do it:
data2 <-
pivot_longer(df, -id, names_to = "feature") %>%
mutate(feature1 = feature, feature2 = feature,
value1 = value, value2 = value, .keep = "unused")
# A tibble: 6 × 5
# id feature1 feature2 value1 value2
#1 1 feature1 feature1 1 1
#2 1 feature2 feature2 4 4
#3 1 feature3 feature3 7 7
#4 2 feature1 feature1 2 2
#5 2 feature2 feature2 5 5
#6 2 feature3 feature3 8 8
Then, altogether:
bind_rows(data1, data2) %>%
arrange(id, feature1, feature2)
# A tibble: 12 × 5
id value1 value2 feature1 feature2
1 1 1 1 feature1 feature1
2 1 1 4 feature1 feature2
3 1 1 7 feature1 feature3
4 1 4 4 feature2 feature2
5 1 4 7 feature2 feature3
6 1 7 7 feature3 feature3
7 2 2 2 feature1 feature1
8 2 2 5 feature1 feature2
9 2 2 8 feature1 feature3
10 2 5 5 feature2 feature2
11 2 5 8 feature2 feature3
12 2 8 8 feature3 feature3
CodePudding user response:
Might be a case of pivoting, renaming and full_join
ing the dataframe to itself?
df <- data.frame(
id = 1:2,
feature1 = 1:2,
feature2 = 4:5,
feature3 = 7:8
)
library(tidyverse)
df_longer <- pivot_longer(df,-id)
df_longer |>
rename(name1 = name, value1 = value) |>
full_join(df_longer |> rename(name2 = name, value2 = value),
by = "id") |>
select(id, value1, value2, name1, name2) |>
arrange(name1, name2)
#> # A tibble: 18 × 5
#> id value1 value2 name1 name2
#> <int> <int> <int> <chr> <chr>
#> 1 1 1 1 feature1 feature1
#> 2 2 2 2 feature1 feature1
#> 3 1 1 4 feature1 feature2
#> 4 2 2 5 feature1 feature2
#> 5 1 1 7 feature1 feature3
#> 6 2 2 8 feature1 feature3
#> 7 1 4 1 feature2 feature1
#> 8 2 5 2 feature2 feature1
#> 9 1 4 4 feature2 feature2
#> 10 2 5 5 feature2 feature2
#> 11 1 4 7 feature2 feature3
#> 12 2 5 8 feature2 feature3
#> 13 1 7 1 feature3 feature1
#> 14 2 8 2 feature3 feature1
#> 15 1 7 4 feature3 feature2
#> 16 2 8 5 feature3 feature2
#> 17 1 7 7 feature3 feature3
#> 18 2 8 8 feature3 feature3
This seems to match the six rows of the output dataset you've provided above - is this what you're looking for?