I have a data set that looks like this:
> newex
Name Volume Period
1 oil 29000 Jun 21
2 gold 800 Mar 22
3 oil 21000 Jul 21
4 gold 1100 Sep 21
5 gold 3000 Feb 21
6 depower 3 Q1 21
7 oil 23000 Apr 22
8 czpower 26 Q1 23
9 oil 17000 Q1 21
10 gold 2400 May 21
11 oil 12000 Q2 21
12 gold 1800 Jan 22
13 czpower 21 Oct 21
14 api2coal 6000 Q1 22
15 api2coal 11000 Q1 21
16 depower 11 Jan 22
17 api2coal 16000 Jul 21
18 gold 1300 Mar 21
19 depower 3 Q1 22
20 oil 17000 Cal 21
I would like to reshape the data set so as to obtain a data frame with the following characteristics:
- The values in
Name
will become the new variables (columns); - The values in
Period
will become the indexes (should be unique); - The values in
Volume
are the sums of the values of each combination ofName
andPeriod
.
Can someone please give me a hint as to how to achieve this? Thank you in advance.
CodePudding user response:
library(data.table)
dcast(setDT(newex),Period~Name, value.var="Volume",fun.aggregate = sum)
Output:
Period api2coal czpower depower gold oil
1: Apr 22 0 0 0 0 23000
2: Cal 21 0 0 0 0 17000
3: Feb 21 0 0 0 3000 0
4: Jan 22 0 0 11 1800 0
5: Jul 21 16000 0 0 0 21000
6: Jun 21 0 0 0 0 29000
7: Mar 21 0 0 0 1300 0
8: Mar 22 0 0 0 800 0
9: May 21 0 0 0 2400 0
10: Oct 21 0 21 0 0 0
11: Q1 21 11000 0 3 0 17000
12: Q1 22 6000 0 3 0 0
13: Q1 23 0 26 0 0 0
14: Q2 21 0 0 0 0 12000
15: Sep 21 0 0 0 1100 0
CodePudding user response:
We could use pivot_wider
:
library(dplyr)
library(tidyr)
newex %>%
pivot_wider(
names_from = Name,
values_from = Volume
)
Period oil gold depower czpower api2coal
<chr> <int> <int> <int> <int> <int>
1 Jun 21 29000 NA NA NA NA
2 Mar 22 NA 800 NA NA NA
3 Jul 21 21000 NA NA NA 16000
4 Sep 21 NA 1100 NA NA NA
5 Feb 21 NA 3000 NA NA NA
6 Q1 21 17000 NA 3 NA 11000
7 Apr 22 23000 NA NA NA NA
8 Q1 23 NA NA NA 26 NA
9 May 21 NA 2400 NA NA NA
10 Q2 21 12000 NA NA NA NA
11 Jan 22 NA 1800 11 NA NA
12 Oct 21 NA NA NA 21 NA
13 Q1 22 NA NA 3 NA 6000
14 Mar 21 NA 1300 NA NA NA
15 Cal 21 17000 NA NA NA NA
data:
newex <- structure(list(Name = c("oil", "gold", "oil", "gold", "gold",
"depower", "oil", "czpower", "oil", "gold", "oil", "gold", "czpower",
"api2coal", "api2coal", "depower", "api2coal", "gold", "depower",
"oil"), Volume = c(29000L, 800L, 21000L, 1100L, 3000L, 3L, 23000L,
26L, 17000L, 2400L, 12000L, 1800L, 21L, 6000L, 11000L, 11L, 16000L,
1300L, 3L, 17000L), Period = c("Jun 21", "Mar 22", "Jul 21",
"Sep 21", "Feb 21", "Q1 21", "Apr 22", "Q1 23", "Q1 21", "May 21",
"Q2 21", "Jan 22", "Oct 21", "Q1 22", "Q1 21", "Jan 22", "Jul 21",
"Mar 21", "Q1 22", "Cal 21")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"))
CodePudding user response:
I found the answer after trying the previous answer:
newex %>%
pivot_wider(
names_from = Name,
values_from = Volume, values_fn = sum)