I have a dataframe which contains 3000 items with their values and categorised in 4 different types. Then I have to distribute all items among 127 people who live in 27 different areas.
df1:
items <- paste0("Item",1:3000)
types <- c("A", "B", "C", "D")
values <- runif(3000, min=0.1, max=10)
areas <- paste0("Area",1:27)
df1 <- data.frame (items)
df1$type <- types
df1$area <- rep(areas, length.out = nrow(df1))
df1$value <- values
And another dataframe which contains people who live in each area.
df2:
names <- randomNames::randomNames(127, which.names = "first")
areas <- paste0("Area",1:27)
df2 <- data.frame (names)
df2$area <- rep(areas, length.out = nrow(df2))
My goal is to distribute all items equally (or as close as possible) among all people in each area, by type and value.
My first attempt to distribute them in Area1 was this:
# get all the items of Area1 of TYPE A and ordered by value
y <- df1 %>% filter(area=="Area1" & type=="A") %>%
arrange(desc(value))
# List of people in Area1
v<-df2 %>% filter(area=="Area1")
v<-unique(v$name)
# Distribute them across all people 1 by 1
y$name<- rep(v, length.out = nrow(y))
# getting all items of Area1 of TYPE B and ordered by value
z <- df1 %>% filter(area=="Area1" & type=="B") %>%
arrange(desc(value))
# Distribute them accross all people 1 by 1
z$name<- rep(v, length.out = nrow(z))
# Combining them
Area1<- rbind(y,z)
I'm looking to automate this process with a loop or a function in order to do the same with all 27 areas and all types. But I don't find the way and my mind is about to blow.
Any help would be very appreciated!
CodePudding user response:
How about this:
library(purrr)
library(dplyr)
items <- paste0("Item",1:3000)
types <- c("A", "B", "C", "D")
values <- runif(3000, min=0.1, max=10)
areas <- paste0("Area",1:27)
df1 <- data.frame (items)
df1$type <- types
df1$area <- rep(areas, length.out = nrow(df1))
df1$value <- values
names <- randomNames::randomNames(127, which.names = "first")
areas <- paste0("Area",1:27)
df2 <- data.frame (names)
df2$area <- rep(areas, length.out = nrow(df2))
f <- function(area, type){
y <- df1 %>% filter(area==area & type==type) %>%
arrange(desc(value))
# List of people in Area1
v<-df2 %>% filter(area==area)
v<-unique(v$name)
# Distribute them across all people 1 by 1
y$name<- rep(v, length.out = nrow(y))
y
}
area_type <- df1 %>% select(area, type) %>% distinct()
out <- map2(area_type$area, area_type$type, f)
out <- do.call(rbind, out)
head(out)
#> items type area value name
#> 1 Item1995 C Area24 9.998251 Joshua
#> 2 Item1991 C Area20 9.985092 Meghan
#> 3 Item2669 A Area23 9.983082 Valden
#> 4 Item2131 C Area25 9.979196 Ashley
#> 5 Item818 B Area8 9.978811 Clay
#> 6 Item639 C Area18 9.975706 Hector
Created on 2022-11-21 by the reprex package (v2.0.1)
In the code above, I wrapped one area-type iteration of your code in a function. Then, I found the distinct area-type combinations in the data and used map2()
from the purrr
package to run the function using each observed combination of area
and type
. Finally, put all the results in one data frame with rbind()
.
CodePudding user response:
Here is an approach that leverages data.table
. After making sure that df2
is unique, I do a cartesian join on area
. I sort the values by type
and value
, and then use a helper function f()
to identify the person to whom each item should be assigned:
library(data.table)
f <- function(x,l) {
v = as.vector(sapply(seq_along(x), \(i) c(x[i:length(x)],x[0:(i-1)])))
rep(v,length.out=l)
}
setDT(df1)[unique(setDT(df2)), on=.(area), allow.cartesian=T] %>%
.[order(type,-value)] %>%
.[,nid:=f(1:uniqueN(names),.N), .(area)] %>%
.[nid==1]