Home > Blockchain >  Splitting collumn with numbers "by nothing" in a dataframe
Splitting collumn with numbers "by nothing" in a dataframe

Time:05-28

I am currently trying to import a txt dataset which I only have numbers in it. When I am importing it I use;

k2= read.delim("dummy.txt", stringsAsFactors = F, header = F, sep="",colClasses = "character")

but unfortunately it seems that this gives the columns combined with only one column like this. And it didn't separate "by nothing". Read table gives the same result.

I am trying to separate the column into multiple columns where every single number will be in a separate column (there are 297 numbers and all will be separate). I tried these with tidyr;

1- k2 %>% separate(as.character(k2$V1), 1:297, "")

2- apply(k2,2,k2 %>% separate(as.character(k2$V1), 1:297, sep= ""))

But get this message in both: Must extract column with a single valid subscript. x Subscript var has size 555514 but must be size 1.

I would like to either import my dataset with everything splitted or split them with a code.

I am not well with the apply family and good tutorial suggestions are also heartfully welcome.

CodePudding user response:

You may scan the file as character and use strsplit on "nothing", rbind it and type.convert.

scan('foo.txt', what='A', quiet=TRUE) |> strsplit('') |> do.call(what=rbind) |> 
  type.convert(as.is=TRUE) |> as.data.frame()
#   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
# 1  1  1  1  1  2  2  2  1  9   9
# 2  1  1  2  2  2  9  9  1  1   9
# 3  1  9  1  1  2  9  2  1  2   2
# 4  9  9  2  2  2  2  1  1  2   2
# 5  9  9  1  1  2  2  2  2  2   9

Note: R >= 4.1 used.


Data:

set.seed(42)
replicate(5, paste(sample(c(1, 2, 9), 10, replace=TRUE), collapse='')) |> 
  as.matrix() |> writeLines('foo.txt')

CodePudding user response:

library(dplyr)

# nums <- read.table('nums.txt', colClasses = "character")

Say you have file structure like this:

nums <- structure(list(V1 = c("003111222", "212251256")), 
                  class = "data.frame", row.names = c(NA, -2L))

Then you can iterate over all values and put substring values into separate columns which will form you a dataframe.

outs <- lapply(nums, function(x){
  x <- unname(x)
  rows <- lapply(1:nchar(x), function(i){
    val <- substr(x, i, i)
    data.frame(val)
  }) %>% 
    bind_cols
  colnames(rows) <- paste0('col',1:nchar(x))
  rows
}) %>% bind_rows

print(outs)
  • Related