Dataframe consist of 3 rows: wine_id, taste_group and and evaluated matching score for each of that group:
wine_id | taste_group | score |
---|---|---|
22 | tree_fruit | 87 |
22 | citrus_fruit | 98 |
22 | tropical_fruit | 17 |
22 | earth | 8 |
22 | microbio | 6 |
22 | oak | 7 |
22 | vegetal | 1 |
How to achieve to make a separate column for each taste_group and to list scores in rows? Hence this:
wine_id | tree_fruit | citrus_fruit | tropical_fruit | earth | microbio | oak | vegetal |
---|---|---|---|---|---|---|---|
22 | 87 | 98 | 17 | 8 | 6 | 7 | 1 |
There are 13 taste groups overall, along with more than 6000 Wines. If the wine doesn't have a score for taste_group row takes value 0.
I used
length(unique(tastes$Group))
length(unique(tastes$Wine_Id))
in R to question basic measures. How to proceed to wanted format?
CodePudding user response:
Assuming your dataframe is named tastes
, you'll want something like:
library(tidyr)
tastes %>%
# Get into desired wide format
pivot_wider(names_from = taste_group, values_from = score, values_fill = 0)
CodePudding user response:
In R
, this is called as the long-to-wide reshaping, you can also use dcast
to do that.
library(data.table)
dt <- fread("
wine_id taste_group score
22 tree_fruit 87
22 citrus_fruit 98
22 tropical_fruit 17
22 earth 8
22 microbio 6
22 oak 7
22 vegetal 1
")
dcast(dt, wine_id ~ taste_group, value.var = "score")
#wine_id citrus_fruit earth microbio oak tree_fruit tropical_fruit vegetal
# <int> <int> <int> <int> <int> <int> <int> <int>
# 22 98 8 6 7 87 17 1