Home > Mobile >  Use tidyr::separate on column with variable number of separator characters
Use tidyr::separate on column with variable number of separator characters

Time:09-16

I need to parse a column created out of folder names. Some folders had many subfolders resulting in a variable amount of "/" in the name. How can I separate on "/" and end up with many columns, some of which will have "NA" if there were no subfolders?

Reprex:

df <- data.frame(group = c("a", "b", "c"), var1 = c(3, 1, 2), 
            id = c("C:/Users/me/big_folder/little_folder/plot/783/abc/551/statistics.csv",
                   "C:/Users/me/big_folder/little_folder/plot/rep/634/efg/552/statistics.csv", 
                   "C:/Users/me/big_folder/228/hij/553/statistics.csv"))  

separate(df, id, sep = "/", into = c(a, b, c, d, e, f, g, h, i, j))

CodePudding user response:

According to ?separate

into - Names of new variables to create as character vector. Use NA to omit the variable in the output.

The OP's current codes uses into without quotes. Either use the built-in vector letters

library(tidyr)
separate(df, "id", sep = "/", into = letters[1:10])

or use

separate(df, "id", sep = "/", into = 
        c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"))

-output

 group var1  a     b  c          d             e    f   g              h    i              j
1     a    3 C: Users me big_folder little_folder plot 783            abc  551 statistics.csv
2     b    1 C: Users me big_folder little_folder plot rep            634  efg            552
3     c    2 C: Users me big_folder           228  hij 553 statistics.csv <NA>           <NA>

CodePudding user response:

In case you are looking for alternatives: We could use cSplit from splitstackshape package. The advantage is that we do not have to define the resulting columns before:

library(splitstackshape)
cSplit(df, "id", "/")

output:

   group var1 id_01 id_02 id_03      id_04         id_05 id_06 id_07          id_08 id_09          id_10          id_11
1:     a    3    C: Users    me big_folder little_folder  plot   783            abc   551 statistics.csv           <NA>
2:     b    1    C: Users    me big_folder little_folder  plot   rep            634   efg            552 statistics.csv
3:     c    2    C: Users    me big_folder           228   hij   553 statistics.csv  <NA>           <NA>           <NA>
  • Related