Hi all I wonder if someone could help me with something thats got me stuck starting out in R? Similar to this post but in R.
Essentially the opposite of the R.utils function seqToHumanReadable.
Namely
Title | LLFCs | Red |
---|---|---|
a | 15, 18 | 11.65 |
b | 16 & 19 | 9.08 |
c | 112-114 | 6.45 |
d | 113-115,145-147 | 6.45 |
Would become:
Title | LLFCs | Red |
---|---|---|
a | 15 | 11.65 |
a | 18 | 11.65 |
b | 16 | 9.08 |
b | 19 | 9.08 |
c | 112 | 6.45 |
c | 113 | 6.45 |
c | 114 | 6.45 |
d | 113 | 6.45 |
d | 114 | 6.45 |
d | 115 | 6.45 |
d | 145 | 6.45 |
d | 146 | 6.45 |
d | 147 | 6.45 |
Thanks so much in advance for any help!
I started trying an if, working out the difference from each of the hyhenated numbers and running though this many times with a loop trying to add additional rows but got in an awful mess.
EDIT: Wow thank you all so much for the replies. Solved my problem and given me a whole lot more to read up on!
CodePudding user response:
Looks a bit cumbersome but it returns your expected output:
library(dplyr)
df %>%
mutate(LLFCs = gsub('-', ':', LLFCs)) %>%
separate_rows(LLFCs, sep = ',') %>%
rowwise() %>%
mutate(LLFCs = ifelse(grepl(':', LLFCs), toString(eval(parse(text = LLFCs))), LLFCs)) %>%
separate_rows(LLFCs)
# A tibble: 13 × 3
Title LLFCs Red
<chr> <chr> <dbl>
1 a 15 11.6
2 a 18 11.6
3 b 16 9.08
4 b 19 9.08
5 c 112 6.45
6 c 113 6.45
7 c 114 6.45
8 d 113 6.45
9 d 114 6.45
10 d 115 6.45
11 d 145 6.45
12 d 146 6.45
13 d 147 6.45
CodePudding user response:
Use gsubfn
to replace x-y with the desired sequence. It is like gsub
except that the matches to the capture groups (parenthesized portions) in the pattern are input into the function (given in formula notation as second argument) and then the entire match is replaced with the output of the function. Then use separate_rows
to separate the elements and put each in its own row.
library(dplyr)
library(gsubfn)
library(tidyr)
DF %>%
mutate(LLFCs = gsubfn("(\\d )-(\\d )",
~ toString(as.numeric(x):as.numeric(y)),
LLFCs)) %>%
separate_rows(LLFCs)
giving:
# A tibble: 13 × 3
Title LLFCs Red
<chr> <chr> <dbl>
1 a 15 11.6
2 a 18 11.6
3 b 16 9.08
4 b 19 9.08
5 c 112 6.45
6 c 113 6.45
7 c 114 6.45
8 d 113 6.45
9 d 114 6.45
10 d 115 6.45
11 d 145 6.45
12 d 146 6.45
13 d 147 6.45
Note
The input in reproducible form:
DF <- structure(list(Title = c("a", "b", "c", "d"), LLFCs = c("15, 18",
"16 & 19", "112-114", "113-115,145-147"), Red = c(11.65, 9.08,
6.45, 6.45)), class = "data.frame", row.names = c(NA, -4L))
CodePudding user response:
A solution using stringi::stri_replace_all_regex
.
Map(data.frame,
stringi::stri_replace_all_regex(sprintf('c(%s)', dat$LLFCs),
c(' & ', '-'), c(', ', ':'), vectorize_all=F) |>
lapply(\(.) cbind(LLFCs=eval(parse(text=.)))),
asplit(dat[c(1, 3)], 1) |> lapply(rbind)
) |> do.call(what=rbind)
# LLFCs Title Red
# 1 15 a 11.65
# 2 18 a 11.65
# 3 16 b 9.08
# 4 19 b 9.08
# 5 112 c 6.45
# 6 113 c 6.45
# 7 114 c 6.45
# 8 113 d 6.45
# 9 114 d 6.45
# 10 115 d 6.45
# 11 145 d 6.45
# 12 146 d 6.45
# 13 147 d 6.45
Data:
dat <- structure(list(Title = c("a", "b", "c", "d"), LLFCs = c("15, 18",
"16 & 19", "112-114", "113-115,145-147"), Red = c(11.65, 9.08,
6.45, 6.45)), class = "data.frame", row.names = c(NA, -4L))