Home > OS >  How to use dplyr to fill in gaps in ranked and sorted index?
How to use dplyr to fill in gaps in ranked and sorted index?

Time:07-20

I am working on an R function that generates a ranked and sorted index with user inputs for starting values (in a list) and a total number of slots to fill for the index. If the list values count is < total number of slots, then sequential numbers are inserted into the gaps. Note that the first index slot in all cases must always = 1 (if 1.1 is not provided in the list) or 1.1 (if 1.1 is provided in the list).

I used the dplyr::dense_rank function in the Reproducible Code for Example 1 at the bottom of this post to correctly fill in the gaps sequentially when the provided list elements are all < the total number of slots to fill.

Is there a way to use dplyr::dense_rank, or another way/function, to fill in the gaps when the list elements are all > than [1 or 1.1] as illustrated in Examples 2 and 3 in the images below, or when there are other gaps between the list elements as illustrated in Example 4 in the image below? Gaps I'm trying to fill are highlighted in yellow in the images. Note that the Reproducible Code at the bottom provides the user inputs for Examples 2-4, commented-out since I ran Example 1.

enter image description here enter image description here enter image description here

Example 1 Reproducible Code output (which is correct, given the Value and totalSlots inputs):

# A tibble: 5 x 2
   Slot Value
  <int> <dbl>
1     1   1.1
2     2   1.2
3     3   2.1
4     4   2.2
5     5   3 

Reproducible Code:

library(dplyr)

# Example 1:
Value <- c(2.1, 1.2, 1.1, 2.2)
totalSlots <- 5

# Example 2:
# Value <- c(2.1, 2.2)
# totalSlots <- 3
# 
# # Example 3:
# Value <- c(4.1, 4.2, 4.3)
# totalSlots <- 6

# Example 4:
# Value <- c(1.1, 1.2, 3.1, 3.2, 3.3, 6.1, 6.2)
# totalSlots <- 10

tibble(Value) %>% 
  mutate(Slot = row_number()) %>% 
  complete(Slot = seq_len(totalSlots)) %>% 
  mutate(
    Value = coalesce(Value[order(Value)], Slot), 
    Value = dense_rank(as.integer(Value))   Value - as.integer(Value)
    )

Here is the Richard Berry solution, generating a 2-column dataframe:

indexDF <- data.frame(Slot = c(1:totalSlots), Value = sort(c(setdiff(1:totalSlots, floor(Value)), Value))[1:totalSlots])
indexDF

CodePudding user response:

You can achieve this with:

sort(c(setdiff(1:totalSlots, floor(Value)), Value))[1:totalSlots]

Breaking it down:

1:totalSlots %>% #candidates for integers to fill gaps
  setdiff(floor(Value)) #remove fill integers already covered by Value
  c(Value) %>% #combine with Value
  sort() #get in order

Then take as many elements as you are interested in with [1:totalSlots]

CodePudding user response:

A potential solution by extracting the first character of each Value to test for presence of each slot number and filling in blank Values to complete total slots:

tibble(Value) |> 
  mutate(initial_int = as.numeric(stringr::str_extract(Value, "^\\d"))) |> 
  full_join(tibble(initial_int = 1:totalSlots)) |> 
  mutate(Value = if_else(is.na(Value), initial_int, Value)) |> 
  arrange(Value) |> 
  head(10) |> 
  mutate(Slot = 1:10) |> 
  select(-initial_int)
  • Related