Home > other >  Selecting all but the first element of a vector in data frame
Selecting all but the first element of a vector in data frame

Time:01-26

I have some data that looks like this:

X1
A,B,C,D,E
A,B
A,B,C,D
A,B,C,D,E,F

I want to generate one column that holds the first element of each vector ("A"), and another column that holds all the rest of the values ("B","C" etc.):

X1              Col1    Col2
A,B,C,D,E       A       B,C,D,E
A,B             A       B
A,B,C,D         A       B,C,D
A,B,C,D,E,F     A       B,C,D,E,F

I have tried the following:

library(dplyr)

testdata <- data.frame(X1 = c("A,B,C,D,E",
                              "A,B",
                              "A,B,C,D",
                              "A,B,C,D,E,F")) %>%
  mutate(Col1 = sapply(strsplit(X1, ","), "[", 1),
         Col2 = sapply(strsplit(X1, ","), "[", -1))

However I cannot seem to get rid of the pesky vector brackets around the values in Col2. Any way of doing this?

CodePudding user response:

You can use tidyr::separate with extra = "merge":

testdata %>% 
  tidyr::separate(X1, into = c("Col1","Col2"), sep = ",", extra = "merge", remove = F)

           X1 Col1      Col2
1   A,B,C,D,E    A   B,C,D,E
2         A,B    A         B
3     A,B,C,D    A     B,C,D
4 A,B,C,D,E,F    A B,C,D,E,F

CodePudding user response:

A possible solution, using tidyr::separate:

library(tidyverse)

df <- data.frame(
  stringsAsFactors = FALSE,
  X1 = c("A,B,C,D,E", "A,B", "A,B,C,D", "A,B,C,D,E,F")
)

df %>% 
  separate(X1, into = str_c("col", 1:2), sep = "(?<=^.),", remove = F)

#>            X1 col1      col2
#> 1   A,B,C,D,E    A   B,C,D,E
#> 2         A,B    A         B
#> 3     A,B,C,D    A     B,C,D
#> 4 A,B,C,D,E,F    A B,C,D,E,F

CodePudding user response:

Try the base R code below using sub read.table

cbind(
  df,
  read.table(
    text = sub(",", " ", df$X1)
  )
)

which gives

           X1 V1        V2
1   A,B,C,D,E  A   B,C,D,E
2         A,B  A         B
3     A,B,C,D  A     B,C,D
4 A,B,C,D,E,F  A B,C,D,E,F

CodePudding user response:

You can use str_sub() function as follow:

> df
# A tibble: 4 x 1
  X1         
  <chr>      
1 A,B,C,D,E  
2 A,B        
3 A,B,C,D    
4 A,B,C,D,E,F

> df %>% mutate(X2 = str_sub(X1, 1,1), X3 = str_sub(X1, 3))
# A tibble: 4 x 3
  X1          X2    X3       
  <chr>       <chr> <chr>    
1 A,B,C,D,E   A     B,C,D,E  
2 A,B         A     B        
3 A,B,C,D     A     B,C,D    
4 A,B,C,D,E,F A     B,C,D,E,F
  •  Tags:  
  • Related