Home > Software design >  Transforming big dataframe in more sensible form
Transforming big dataframe in more sensible form

Time:12-26

Dataframe consist of 3 rows: wine_id, taste_group and and evaluated matching score for each of that group:

wine_id taste_group score
22 tree_fruit 87
22 citrus_fruit 98
22 tropical_fruit 17
22 earth 8
22 microbio 6
22 oak 7
22 vegetal 1

How to achieve to make a separate column for each taste_group and to list scores in rows? Hence this:

wine_id tree_fruit citrus_fruit tropical_fruit earth microbio oak vegetal
22 87 98 17 8 6 7 1

There are 13 taste groups overall, along with more than 6000 Wines. If the wine doesn't have a score for taste_group row takes value 0.

I used

length(unique(tastes$Group))
length(unique(tastes$Wine_Id))

in R to question basic measures. How to proceed to wanted format?

CodePudding user response:

Assuming your dataframe is named tastes, you'll want something like:

library(tidyr)

tastes %>%
    # Get into desired wide format
    pivot_wider(names_from = taste_group, values_from = score, values_fill = 0)

CodePudding user response:

In R, this is called as the long-to-wide reshaping, you can also use dcast to do that.

library(data.table)

dt <- fread("
wine_id taste_group score
22  tree_fruit  87
22  citrus_fruit    98
22  tropical_fruit  17
22  earth   8
22  microbio    6
22  oak 7
22  vegetal 1
")

dcast(dt, wine_id ~ taste_group, value.var = "score")

#wine_id citrus_fruit earth microbio   oak tree_fruit tropical_fruit vegetal
#     <int>        <int> <int>    <int> <int>      <int>          <int>   <int>
#      22           98     8        6     7         87             17       1
  • Related