Question:
Below works, but is there a better "R way" of achieving similar result? I am essentially trying to create / distribute groups into individual line items according to a user defined function (currently just using a loop).
Example:
df1 <- data.frame(group = c("A", "B", "C"),
volume = c(200L, 45L, 104L)
)
print(df1)
#> group volume
#> 1 A 200
#> 2 B 45
#> 3 C 104
I want the volume
to be broken across multiple rows according to group so that the final result is a dataframe where the new volume (vol2
in the below) would add up to original volume above. In this example, I'm applying integer math with a divisor of 52, so my final result should be:
print(df3)
#> group vol2
#> 1 A 52
#> 2 A 52
#> 3 A 52
#> 4 A 44
#> 21 B 45
#> 31 C 52
#> 32 C 52
This works
The code below DOES get me to the desired result shown above:
div <- 52L
df1$intgr <- df1$volume %/% div
df1$remainder <- df1$volume %% div
print(df1)
#> group volume intgr remainder
#> 1 A 200 3 44
#> 2 B 45 0 45
#> 3 C 104 2 0
df2 <- data.frame()
for (r in 1:nrow(df1)){
if(df1[r,"intgr"] > 0){
for (k in 1:as.integer(df1[r,"intgr"])){
df1[r,"vol2"] <- div
df2 <- rbind(df2, df1[r,])
}
}
if(df1[r,"remainder"]>0){
df1[r, "vol2"] <- as.integer(df1[r, "remainder"])
df2 <- rbind(df2, df1[r,])
}
}
print(df2)
#> group volume intgr remainder vol2
#> 1 A 200 3 44 52
#> 2 A 200 3 44 52
#> 3 A 200 3 44 52
#> 4 A 200 3 44 44
#> 21 B 45 0 45 45
#> 31 C 104 2 0 52
#> 32 C 104 2 0 52
df3 <- subset(df2, select = c("group", "vol2"))
print(df3)
#> group vol2
#> 1 A 52
#> 2 A 52
#> 3 A 52
#> 4 A 44
#> 21 B 45
#> 31 C 52
#> 32 C 52
Being still relatively new to R, I'm just curious if someone knows a better way / function / method that gets to the same place. Seems like there might be. I could potentially have a more complex way of breaking up the rows and I was thinking maybe there's a method that applies a UDF to the dataframe to do something like this. I was searching for "expand group/groups" but was finding mostly "expand.grid" which isn't what I'm doing here.
Thank you for any suggestions!
CodePudding user response:
A quick function to help split each number by the modulus,
fun <- function(num, mod) c(rep(mod, ceiling(num / mod)-1), (num-1) %% mod 1)
fun(200, 52)
# [1] 52 52 52 44
fun(45, 52)
# [1] 45
fun(104, 52)
# [1] 52 52
And we can apply this a number of ways:
library(dplyr)
df1 %>%
group_by(group) %>%
summarize(vol2 = fun(volume, 52), .groups = "drop")
# # A tibble: 7 x 2
# group vol2
# <chr> <dbl>
# 1 A 52
# 2 A 52
# 3 A 52
# 4 A 44
# 5 B 45
# 6 C 52
# 7 C 52
do.call(rbind, by(df1, seq(nrow(df1)),
FUN = function(z) data.frame(group = z$group, vol2 = fun(z$volume, 52))))
# group vol2
# 1.1 A 52
# 1.2 A 52
# 1.3 A 52
# 1.4 A 44
# 2 B 45
# 3.1 C 52
# 3.2 C 52
library(data.table)
setDT(df1)
df1[, .(vol2 = fun(volume, 52)), by = group]
# group vol2
# <char> <num>
# 1: A 52
# 2: A 52
# 3: A 52
# 4: A 44
# 5: B 45
# 6: C 52
# 7: C 52
CodePudding user response:
A tidyverse
approach using purrr::pmap
and tidyr::unnest_longer
may look like so:
library(dplyr, w = FALSE)
library(tidyr)
library(purrr)
div <- 52
df1 |>
mutate(intgr = volume %/% div, remainder = volume %% div, intgr1 = (remainder > 0)) |>
mutate(vol2 = purrr::pmap(list(intgr, intgr1, remainder), ~ c(rep(div, ..1), rep(..3, ..2)))) |>
tidyr::unnest_longer(vol2) |>
select(-intgr1)
#> # A tibble: 7 × 5
#> group volume intgr remainder vol2
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 A 200 3 44 52
#> 2 A 200 3 44 52
#> 3 A 200 3 44 52
#> 4 A 200 3 44 44
#> 5 B 45 0 45 45
#> 6 C 104 2 0 52
#> 7 C 104 2 0 52
CodePudding user response:
With data.table
and rep
:
library(data.table)
setDT(df1)[, .(vol2 = c(rep(52, volume%/R), (volume%R)[sign(volume%R)])), group][]
#> group vol2
#> 1: A 52
#> 2: A 52
#> 3: A 52
#> 4: A 44
#> 5: B 45
#> 6: C 52
#> 7: C 52
Or
setDT(df1)[, .(vol2 = c(rep(52, volume%/R), volume%R)), group][vol2 != 0][]
#> group vol2
#> 1: A 52
#> 2: A 52
#> 3: A 52
#> 4: A 44
#> 5: B 45
#> 6: C 52
#> 7: C 52
CodePudding user response:
Vectorised and without grouping:
df1 <- data.frame(group = c("A", "B", "C"),
volume = c(200L, 45L, 104L))
n <- 52
idx <- df1$volume %/% n ((sel <- df1$volume %% n) != 0)
out <- df1[rep(seq_len(nrow(df1)), idx),]
out$volume <- n
out$volume[cumsum(idx)[sel != 0]] <- sel[sel != 0]
## group volume
##1 A 52
##1.1 A 52
##1.2 A 52
##1.3 A 44
##2 B 45
##3 C 52
##3.1 C 52