Home > Enterprise >  R Aggregate data frame based on column values
R Aggregate data frame based on column values

Time:05-24

I have a data set that looks like this:

> newex
        Name Volume Period
1        oil  29000 Jun 21
2       gold    800 Mar 22
3        oil  21000 Jul 21
4       gold   1100 Sep 21
5       gold   3000 Feb 21
6    depower      3  Q1 21
7        oil  23000 Apr 22
8    czpower     26  Q1 23
9        oil  17000  Q1 21
10      gold   2400 May 21
11       oil  12000  Q2 21
12      gold   1800 Jan 22
13   czpower     21 Oct 21
14  api2coal   6000  Q1 22
15  api2coal  11000  Q1 21
16   depower     11 Jan 22
17  api2coal  16000 Jul 21
18      gold   1300 Mar 21
19   depower      3  Q1 22
20       oil  17000 Cal 21

I would like to reshape the data set so as to obtain a data frame with the following characteristics:

  • The values in Name will become the new variables (columns);
  • The values in Period will become the indexes (should be unique);
  • The values in Volume are the sums of the values of each combination of Name and Period.

Can someone please give me a hint as to how to achieve this? Thank you in advance.

CodePudding user response:

library(data.table)
dcast(setDT(newex),Period~Name, value.var="Volume",fun.aggregate = sum)

Output:

    Period api2coal czpower depower gold   oil
 1: Apr 22        0       0       0    0 23000
 2: Cal 21        0       0       0    0 17000
 3: Feb 21        0       0       0 3000     0
 4: Jan 22        0       0      11 1800     0
 5: Jul 21    16000       0       0    0 21000
 6: Jun 21        0       0       0    0 29000
 7: Mar 21        0       0       0 1300     0
 8: Mar 22        0       0       0  800     0
 9: May 21        0       0       0 2400     0
10: Oct 21        0      21       0    0     0
11: Q1  21    11000       0       3    0 17000
12: Q1  22     6000       0       3    0     0
13: Q1  23        0      26       0    0     0
14: Q2  21        0       0       0    0 12000
15: Sep 21        0       0       0 1100     0

CodePudding user response:

We could use pivot_wider:

library(dplyr)
library(tidyr)

newex %>% 
  pivot_wider(
    names_from = Name,
    values_from = Volume
  )
   Period   oil  gold depower czpower api2coal
   <chr>  <int> <int>   <int>   <int>    <int>
 1 Jun 21 29000    NA      NA      NA       NA
 2 Mar 22    NA   800      NA      NA       NA
 3 Jul 21 21000    NA      NA      NA    16000
 4 Sep 21    NA  1100      NA      NA       NA
 5 Feb 21    NA  3000      NA      NA       NA
 6 Q1  21 17000    NA       3      NA    11000
 7 Apr 22 23000    NA      NA      NA       NA
 8 Q1  23    NA    NA      NA      26       NA
 9 May 21    NA  2400      NA      NA       NA
10 Q2  21 12000    NA      NA      NA       NA
11 Jan 22    NA  1800      11      NA       NA
12 Oct 21    NA    NA      NA      21       NA
13 Q1  22    NA    NA       3      NA     6000
14 Mar 21    NA  1300      NA      NA       NA
15 Cal 21 17000    NA      NA      NA       NA

data:

newex <- structure(list(Name = c("oil", "gold", "oil", "gold", "gold", 
"depower", "oil", "czpower", "oil", "gold", "oil", "gold", "czpower", 
"api2coal", "api2coal", "depower", "api2coal", "gold", "depower", 
"oil"), Volume = c(29000L, 800L, 21000L, 1100L, 3000L, 3L, 23000L, 
26L, 17000L, 2400L, 12000L, 1800L, 21L, 6000L, 11000L, 11L, 16000L, 
1300L, 3L, 17000L), Period = c("Jun 21", "Mar 22", "Jul 21", 
"Sep 21", "Feb 21", "Q1  21", "Apr 22", "Q1  23", "Q1  21", "May 21", 
"Q2  21", "Jan 22", "Oct 21", "Q1  22", "Q1  21", "Jan 22", "Jul 21", 
"Mar 21", "Q1  22", "Cal 21")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20"))

CodePudding user response:

I found the answer after trying the previous answer:

newex %>% 
 pivot_wider(
     names_from = Name,
     values_from = Volume, values_fn = sum)
  • Related