My data "x" looks like this:
date type cost
20-01 supp 5
20-02 supp 10
20-03 supp 5
20-01 svcs 2
20-02 svcs 4
20-03 svcs 8
I want to create a cost for each type in order to plot a multiple time series, which I can do by creating two separate time series, but what I'd like to do is create:
bydate <- aggregate(cbind(supp, svcs)~date, data=y, FUN=sum)
With my data "y" looking like this:
date type supp svcs
20-01 supp 5 0
20-02 supp 10 0
20-03 supp 5 0
20-01 svcs 0 2
20-02 svcs 0 4
20-03 svcs 0 8
In this way I hope to create
df <- bydate %>%
select(date, supp, svcs) %>%
gather(key = "variable", value = "value", -date)
Is the mutate function the way to do this?
CodePudding user response:
We have to create index variables before pivot
ing, and then pivot_wider with the valueS_fill
argument set to 0
.
library(tidyr)
library(dplyr)
df %>%
mutate(index = row_number(),
type2 = type) %>%
pivot_wider(names_from = type2, values_from = cost, values_fill = 0) %>%
select(-index)
# A tibble: 6 × 4
date type supp svcs
<chr> <chr> <dbl> <dbl>
1 20-01 supp 5 0
2 20-02 supp 10 0
3 20-03 supp 5 0
4 20-01 svcs 0 2
5 20-02 svcs 0 4
6 20-03 svcs 0 8
CodePudding user response:
Here is a version using bind_cols
library(dplyr)
library(tidyr)
x %>%
dplyr::mutate(id = row_number()) %>%
pivot_wider(
names_from = type,
values_from = cost,
values_fill = 0
) %>%
bind_cols(type = x$type) %>%
select(date, type, everything(), -id)
date type supp svcs
<chr> <chr> <int> <int>
1 20-01 supp 5 0
2 20-02 supp 10 0
3 20-03 supp 5 0
4 20-01 svcs 0 2
5 20-02 svcs 0 4
6 20-03 svcs 0 8
CodePudding user response:
For this kind of problem we do not necessarily need data rectangling. An alternative is to use purrr::map_dfc
inside dplyr::mutate
together with purrr::set_names
:
library(dplyr)
library(purrr)
df %>%
mutate(map_dfc(set_names(unique(type)),
~ ifelse(.x == type, cost, 0))
)
#> date type cost supp scvs
#> 1 20-01 supp 5 5 0
#> 2 20-02 supp 10 10 0
#> 3 20-03 supp 5 5 0
#> 4 20-01 scvs 2 0 2
#> 5 20-02 scvs 4 0 4
#> 6 20-03 scvs 8 0 8
To simplify this and similar problems I have a package on Github. In this case we could use over
together with dist_values
:
library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover
df %>%
mutate(over(dist_values(type),
~ ifelse(.x == type, cost, 0))
)
#> date type cost scvs supp
#> 1 20-01 supp 5 0 5
#> 2 20-02 supp 10 0 10
#> 3 20-03 supp 5 0 5
#> 4 20-01 scvs 2 2 0
#> 5 20-02 scvs 4 4 0
#> 6 20-03 scvs 8 8 0
Created on 2021-12-30 by the reprex package (v0.3.0)
data:
df <- data.frame(
date = rep(c("20-01", "20-02", "20-03"), 2),
type = rep(c("supp", "scvs"), each = 3),
cost = c(5, 10, 5, 2, 4, 8)
)
Created on 2021-12-30 by the reprex package (v0.3.0)