I need to parse a column created out of folder names. Some folders had many subfolders resulting in a variable amount of "/" in the name. How can I separate on "/" and end up with many columns, some of which will have "NA" if there were no subfolders?
Reprex:
df <- data.frame(group = c("a", "b", "c"), var1 = c(3, 1, 2),
id = c("C:/Users/me/big_folder/little_folder/plot/783/abc/551/statistics.csv",
"C:/Users/me/big_folder/little_folder/plot/rep/634/efg/552/statistics.csv",
"C:/Users/me/big_folder/228/hij/553/statistics.csv"))
separate(df, id, sep = "/", into = c(a, b, c, d, e, f, g, h, i, j))
CodePudding user response:
According to ?separate
into - Names of new variables to create as character vector. Use NA to omit the variable in the output.
The OP's current codes uses into
without quotes. Either use the built-in vector letters
library(tidyr)
separate(df, "id", sep = "/", into = letters[1:10])
or use
separate(df, "id", sep = "/", into =
c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"))
-output
group var1 a b c d e f g h i j
1 a 3 C: Users me big_folder little_folder plot 783 abc 551 statistics.csv
2 b 1 C: Users me big_folder little_folder plot rep 634 efg 552
3 c 2 C: Users me big_folder 228 hij 553 statistics.csv <NA> <NA>
CodePudding user response:
In case you are looking for alternatives:
We could use cSplit
from splitstackshape
package.
The advantage is that we do not have to define the resulting columns before:
library(splitstackshape)
cSplit(df, "id", "/")
output:
group var1 id_01 id_02 id_03 id_04 id_05 id_06 id_07 id_08 id_09 id_10 id_11
1: a 3 C: Users me big_folder little_folder plot 783 abc 551 statistics.csv <NA>
2: b 1 C: Users me big_folder little_folder plot rep 634 efg 552 statistics.csv
3: c 2 C: Users me big_folder 228 hij 553 statistics.csv <NA> <NA> <NA>