I have a data frame in R with a column (sni) with numbers that looks like this etc
bransch | sni |
---|---|
name | 15 |
name | 15 |
name | 16-18 |
somename | 16-18 |
name | 241-3 |
someothername | 241-3 |
where I have to transform/create a new column with just one number per row, i.e. no intervals so for example it should be a new row for all individual values in the intervals and look like this
bransch | sni |
---|---|
name | 15 |
name | 15 |
name | 16 |
name | 17 |
name | 18 |
somename | 16 |
somename | 17 |
somename | 18 |
name | 241 |
name | 242 |
name | 243 |
someothername | 241 |
someothername | 242 |
someothername | 243 |
I'm a bit unsure which function can do this the best way, or if someone has stumble upon a similar problem/solution. Currently I have tried to split the sni column (where the "-" starts) into two new ones, but then I'm a bit stuck since I will have many rows in one of the new columns without any values etc. Also the column is a character at the moment.
Any advice? Sincerely, TS
CodePudding user response:
I took a while. Here is tidyverse approach:
library(dplyr)
library(tidyr)
df %>%
separate(sni, c("x", "y")) %>%
as_tibble() %>%
mutate(y = ifelse(as.numeric(y)<=9, paste0(substr(x, 1, nchar(x)-1), y),
y)) %>%
mutate(id = row_number()) %>%
pivot_longer(c(x,y)) %>%
mutate(value = as.numeric(value)) %>%
group_by(col2 =as.integer(gl(n(),2,n()))) %>%
fill(value, .direction = "down") %>%
complete(value = seq(first(value), last(value), by=1)) %>%
fill(bransch, .direction = "down") %>%
select(bransch, sni=value) %>%
group_by(col2, sni) %>%
slice(1)
col2 bransch sni
<int> <chr> <dbl>
1 1 name 15
2 2 name 15
3 3 name 16
4 3 name 17
5 3 name 18
6 4 somename 16
7 4 somename 17
8 4 somename 18
9 5 name 241
10 5 name 242
11 5 name 243
12 6 someothername 241
13 6 someothername 242
14 6 someothername 243
CodePudding user response:
Let's try this.
Assume only three digits interval would have the pattern of 123-5
instead of 123-125
, therefore in the ifelse
, we modify this special pattern (e.g. 123-5
) of interval into more regular one (123-125
). Then separate the interval to individual integer using separate_rows
.
We can then use complete
to fill in the missing sequence in the interval.
library(tidyverse)
df %>%
group_by(sni,bransch) %>%
mutate(sni2 = ifelse(grepl("-", sni) & nchar(sub("-.*$", "", sni)) >= 3,
sub("^(\\d\\d)(.)-", "\\1\\2-\\1", sni),
sni)) %>%
separate_rows(sni2, convert = T) %>%
complete(sni2 = min(sni2):max(sni2)) %>%
ungroup() %>%
select(-sni)
# A tibble: 14 × 2
bransch sni2
<chr> <int>
1 name 15
2 name 15
3 name 16
4 name 17
5 name 18
6 somename 16
7 somename 17
8 somename 18
9 name 241
10 name 242
11 name 243
12 someothername 241
13 someothername 242
14 someothername 243
CodePudding user response:
If I understood correctly
tmp=setNames(strsplit(df$sni,"-"),df$bransch)
tmp=unlist(
lapply(tmp,function(x){
x=as.numeric(x)
if (length(x)>1) {
if (x[1]<x[2]) {
seq(x[1],x[2],1)
} else {
seq(x[1],x[1] x[2]-1,1)
}
} else {
x
}
})
)
data.frame(
"bransch"=names(tmp),
"sni"=tmp
)
bransch sni
1 name 15
2 name 15
3 name1 16
4 name2 17
5 name3 18
6 somename1 16
7 somename2 17
8 somename3 18
9 name1 241
10 name2 242
11 name3 243
12 someothername1 241
13 someothername2 242
14 someothername3 243
CodePudding user response:
Follow up question, in a similiar data frame I also have the following (last two rows)
bransch | sni |
---|---|
name | 15 |
name | 15 |
name | 16-18 |
somename | 16-18 |
name | 241-3 |
someothername | 241-3 |
name | 45 47 |
someothername | 4719 474-9 |
suggestions?
Sincerely, TS
CodePudding user response:
Using separate to get the start and end of the sequence, the we can map and unnest to get the result.
library (tidyverse)
data %>%
separate(
sni,
into = c("from", "to"),
fill = "right",
convert = TRUE) %>%
mutate(to = if_else(is.na(to), from, to)) %>%
transmute(
bransch,
sni = map2(from, to, `:`)) %>%
unnest_longer(sni)
# A tibble: 14 x 2
bransch sni
<chr> <int>
1 name 15
2 name 15
3 name 16
4 name 17
5 name 18
6 some name 16
7 some name 17
8 some name 18
9 name 241
10 name 242
11 name 243
12 someothername 241
13 someothername 242
14 someothername 243
Data
data <- tibble(
bransch = c("name","name","name","some name","name","someothername"),
sni =c("15","15","16-18","16-18","241-243","241-243"))