How do you unwrap hyphen and comma-separated number ranges in R? Eg 1,3,5-7 -> 1,3,5,6,7 on diffe-CodePudding

Hi all I wonder if someone could help me with something thats got me stuck starting out in R? Similar to this post but in R.

Essentially the opposite of the R.utils function seqToHumanReadable.

Namely

Title	LLFCs	Red
a	15, 18	11.65
b	16 & 19	9.08
c	112-114	6.45
d	113-115,145-147	6.45

Would become:

Title	LLFCs	Red
a	15	11.65
a	18	11.65
b	16	9.08
b	19	9.08
c	112	6.45
c	113	6.45
c	114	6.45
d	113	6.45
d	114	6.45
d	115	6.45
d	145	6.45
d	146	6.45
d	147	6.45

Thanks so much in advance for any help!

I started trying an if, working out the difference from each of the hyhenated numbers and running though this many times with a loop trying to add additional rows but got in an awful mess.

EDIT: Wow thank you all so much for the replies. Solved my problem and given me a whole lot more to read up on!

CodePudding user response：

Looks a bit cumbersome but it returns your expected output:

library(dplyr)

df %>% 
 mutate(LLFCs = gsub('-', ':', LLFCs)) %>% 
 separate_rows(LLFCs, sep = ',') %>% 
 rowwise() %>% 
 mutate(LLFCs = ifelse(grepl(':', LLFCs), toString(eval(parse(text = LLFCs))), LLFCs)) %>% 
 separate_rows(LLFCs)

# A tibble: 13 × 3
   Title LLFCs   Red
   <chr> <chr> <dbl>
 1 a     15    11.6 
 2 a     18    11.6 
 3 b     16     9.08
 4 b     19     9.08
 5 c     112    6.45
 6 c     113    6.45
 7 c     114    6.45
 8 d     113    6.45
 9 d     114    6.45
10 d     115    6.45
11 d     145    6.45
12 d     146    6.45
13 d     147    6.45

CodePudding user response：

Use gsubfn to replace x-y with the desired sequence. It is like gsub except that the matches to the capture groups (parenthesized portions) in the pattern are input into the function (given in formula notation as second argument) and then the entire match is replaced with the output of the function. Then use separate_rows to separate the elements and put each in its own row.

library(dplyr)
library(gsubfn)
library(tidyr)

DF %>%
 mutate(LLFCs = gsubfn("(\\d )-(\\d )", 
                       ~ toString(as.numeric(x):as.numeric(y)),
                       LLFCs)) %>%
 separate_rows(LLFCs)

giving:

# A tibble: 13 × 3
   Title LLFCs   Red
   <chr> <chr> <dbl>
 1 a     15    11.6 
 2 a     18    11.6 
 3 b     16     9.08
 4 b     19     9.08
 5 c     112    6.45
 6 c     113    6.45
 7 c     114    6.45
 8 d     113    6.45
 9 d     114    6.45
10 d     115    6.45
11 d     145    6.45
12 d     146    6.45
13 d     147    6.45

Note

The input in reproducible form:

DF <- structure(list(Title = c("a", "b", "c", "d"), LLFCs = c("15, 18", 
"16 & 19", "112-114", "113-115,145-147"), Red = c(11.65, 9.08, 
6.45, 6.45)), class = "data.frame", row.names = c(NA, -4L))

CodePudding user response：

A solution using stringi::stri_replace_all_regex.

Map(data.frame, 
    stringi::stri_replace_all_regex(sprintf('c(%s)', dat$LLFCs), 
                                    c(' & ', '-'), c(', ', ':'), vectorize_all=F) |>
      lapply(\(.) cbind(LLFCs=eval(parse(text=.)))),
    asplit(dat[c(1, 3)], 1) |> lapply(rbind)
) |> do.call(what=rbind)
#    LLFCs Title   Red
# 1     15     a 11.65
# 2     18     a 11.65
# 3     16     b  9.08
# 4     19     b  9.08
# 5    112     c  6.45
# 6    113     c  6.45
# 7    114     c  6.45
# 8    113     d  6.45
# 9    114     d  6.45
# 10   115     d  6.45
# 11   145     d  6.45
# 12   146     d  6.45
# 13   147     d  6.45

Data:

dat <- structure(list(Title = c("a", "b", "c", "d"), LLFCs = c("15, 18", 
"16 & 19", "112-114", "113-115,145-147"), Red = c(11.65, 9.08, 
6.45, 6.45)), class = "data.frame", row.names = c(NA, -4L))