Home > front end >  Manipulate/rearrange intervals in R columns
Manipulate/rearrange intervals in R columns

Time:08-20

I have a data frame in R with a column (sni) with numbers that looks like this etc

bransch sni
name 15
name 15
name 16-18
somename 16-18
name 241-3
someothername 241-3

where I have to transform/create a new column with just one number per row, i.e. no intervals so for example it should be a new row for all individual values in the intervals and look like this

bransch sni
name 15
name 15
name 16
name 17
name 18
somename 16
somename 17
somename 18
name 241
name 242
name 243
someothername 241
someothername 242
someothername 243

I'm a bit unsure which function can do this the best way, or if someone has stumble upon a similar problem/solution. Currently I have tried to split the sni column (where the "-" starts) into two new ones, but then I'm a bit stuck since I will have many rows in one of the new columns without any values etc. Also the column is a character at the moment.

Any advice? Sincerely, TS

CodePudding user response:

I took a while. Here is tidyverse approach:

library(dplyr)
library(tidyr)

df %>% 
  separate(sni, c("x", "y")) %>% 
  as_tibble() %>% 
  mutate(y = ifelse(as.numeric(y)<=9, paste0(substr(x, 1, nchar(x)-1), y),
                                 y)) %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(c(x,y)) %>% 
  mutate(value = as.numeric(value)) %>% 
  group_by(col2 =as.integer(gl(n(),2,n()))) %>% 
  fill(value, .direction = "down") %>% 
  complete(value = seq(first(value), last(value), by=1)) %>% 
  fill(bransch, .direction = "down") %>% 
  select(bransch, sni=value) %>% 
  group_by(col2, sni) %>% 
  slice(1)

   col2 bransch         sni
   <int> <chr>         <dbl>
 1     1 name             15
 2     2 name             15
 3     3 name             16
 4     3 name             17
 5     3 name             18
 6     4 somename         16
 7     4 somename         17
 8     4 somename         18
 9     5 name            241
10     5 name            242
11     5 name            243
12     6 someothername   241
13     6 someothername   242
14     6 someothername   243

CodePudding user response:

Let's try this.

Assume only three digits interval would have the pattern of 123-5 instead of 123-125, therefore in the ifelse, we modify this special pattern (e.g. 123-5) of interval into more regular one (123-125). Then separate the interval to individual integer using separate_rows.

We can then use complete to fill in the missing sequence in the interval.

library(tidyverse)

df %>% 
  group_by(sni,bransch) %>% 
  mutate(sni2 = ifelse(grepl("-", sni) & nchar(sub("-.*$", "", sni)) >= 3, 
                       sub("^(\\d\\d)(.)-", "\\1\\2-\\1", sni), 
                       sni)) %>% 
  separate_rows(sni2, convert = T) %>%
  complete(sni2 = min(sni2):max(sni2)) %>% 
  ungroup() %>% 
  select(-sni)

# A tibble: 14 × 2
   bransch        sni2
   <chr>         <int>
 1 name             15
 2 name             15
 3 name             16
 4 name             17
 5 name             18
 6 somename         16
 7 somename         17
 8 somename         18
 9 name            241
10 name            242
11 name            243
12 someothername   241
13 someothername   242
14 someothername   243

CodePudding user response:

If I understood correctly

tmp=setNames(strsplit(df$sni,"-"),df$bransch)
tmp=unlist(
  lapply(tmp,function(x){
    x=as.numeric(x)
    if (length(x)>1) {
      if (x[1]<x[2]) {
        seq(x[1],x[2],1)
      } else {
        seq(x[1],x[1] x[2]-1,1)
      }
    } else {
      x
    }
  })
)

data.frame(
  "bransch"=names(tmp),
  "sni"=tmp
)

          bransch sni
1            name  15
2            name  15
3           name1  16
4           name2  17
5           name3  18
6       somename1  16
7       somename2  17
8       somename3  18
9           name1 241
10          name2 242
11          name3 243
12 someothername1 241
13 someothername2 242
14 someothername3 243

CodePudding user response:

Follow up question, in a similiar data frame I also have the following (last two rows)

bransch sni
name 15
name 15
name 16-18
somename 16-18
name 241-3
someothername 241-3
name 45 47
someothername 4719 474-9

suggestions?

Sincerely, TS

CodePudding user response:

Using separate to get the start and end of the sequence, the we can map and unnest to get the result.

library (tidyverse) 
 data %>%
        separate(
        sni, 
        into = c("from", "to"),
        fill = "right",
        convert = TRUE) %>%
        mutate(to = if_else(is.na(to), from, to)) %>%
        transmute(
        bransch,
        sni = map2(from, to, `:`)) %>%
        unnest_longer(sni)

# A tibble: 14 x 2
   bransch         sni
   <chr>         <int>
 1 name             15
 2 name             15
 3 name             16
 4 name             17
 5 name             18
 6 some name        16
 7 some name        17
 8 some name        18
 9 name            241
10 name            242
11 name            243
12 someothername   241
13 someothername   242
14 someothername   243

Data

data <- tibble(
bransch = c("name","name","name","some name","name","someothername"),
sni =c("15","15","16-18","16-18","241-243","241-243"))
  • Related