Home > OS >  How can I separate the information in one column of the table I have into 3 separate columns?
How can I separate the information in one column of the table I have into 3 separate columns?

Time:02-14

For example, one column of the table I have is like this

HGVS.Consequence
    Lys10Arg
    Lys10Lys
    LeullLeu
    Phe12Ser
    Phe12Cys
    lle13Leu
    lle13Val
    lle13Phe
    Thr15Pro

And I want a table like this.

Mutation  Ref  Change Position
lle13Val  lle   Val      13
lle13Phe  lle   Phe      13
Thr15Pro  Thr   Pro      15

CodePudding user response:

tidyr::extract(df, HGVS.Consequence, 
     c('Ref', 'Position', 'Change'), '(\\D )(\\d )(\\D )', remove = FALSE)

CodePudding user response:

Using tidyr::separate and tidying the ordering / names with dplyr:

tidyr::separate(data   = df, 
                col    = HGVS.Consequence, 
                into   = c("Ref", "Position", "Change"), 
                sep    = c(3, 5, 8), 
                remove = FALSE) |>
  dplyr::select(1, 2, 4, 3) |>
  dplyr::rename(mutation = HGVS.Consequence)

#>   mutation Ref Change Position
#> 1 Lys10Arg Lys    Arg       10
#> 2 Lys10Lys Lys    Lys       10
#> 3 LeullLeu Leu    Leu       ll
#> 4 Phe12Ser Phe    Ser       12
#> 5 Phe12Cys Phe    Cys       12
#> 6 lle13Leu lle    Leu       13
#> 7 lle13Val lle    Val       13
#> 8 lle13Phe lle    Phe       13
#> 9 Thr15Pro Thr    Pro       15

CodePudding user response:

Code

Here is a base R way with substr.

sepfun <- function(x){
  s1 <- substr(x, 1, 3)
  s2 <- substr(x, 4, 5)
  s3 <- substring(x, 6)
  y <- do.call(cbind.data.frame, list(s1, s3, s2))
  names(y) <- c("Ref", "Change", "Position")
  cbind(Mutation = x, y)
}

sepfun(df1$HGVS.Consequence)
#>   Mutation Ref Change Position
#> 1 Lys10Arg Lys    Arg       10
#> 2 Lys10Lys Lys    Lys       10
#> 3 LeullLeu Leu    Leu       ll
#> 4 Phe12Ser Phe    Ser       12
#> 5 Phe12Cys Phe    Cys       12
#> 6 lle13Leu lle    Leu       13
#> 7 lle13Val lle    Val       13
#> 8 lle13Phe lle    Phe       13
#> 9 Thr15Pro Thr    Pro       15

Created on 2022-02-13 by the reprex package (v2.0.1)

Data

HGVS.Consequence<-scan(text = '
Lys10Arg
Lys10Lys
LeullLeu
Phe12Ser
Phe12Cys
lle13Leu
lle13Val
lle13Phe
Thr15Pro
', sep = "\n", what = character())
df1 <- data.frame(HGVS.Consequence)

Created on 2022-02-13 by the reprex package (v2.0.1)

  • Related