I am currently trying to import a txt dataset which I only have numbers in it. When I am importing it I use;
k2= read.delim("dummy.txt", stringsAsFactors = F, header = F, sep="",colClasses = "character")
but unfortunately it seems that this gives the columns combined with only one column like this. And it didn't separate "by nothing". Read table gives the same result.
I am trying to separate the column into multiple columns where every single number will be in a separate column (there are 297 numbers and all will be separate). I tried these with tidyr;
1- k2 %>% separate(as.character(k2$V1), 1:297, "")
2- apply(k2,2,k2 %>% separate(as.character(k2$V1), 1:297, sep= ""))
But get this message in both: Must extract column with a single valid subscript. x Subscript var
has size 555514 but must be size 1.
I would like to either import my dataset with everything splitted or split them with a code.
I am not well with the apply family and good tutorial suggestions are also heartfully welcome.
CodePudding user response:
You may scan
the file as character and use strsplit
on "nothing", rbind
it and type.convert
.
scan('foo.txt', what='A', quiet=TRUE) |> strsplit('') |> do.call(what=rbind) |>
type.convert(as.is=TRUE) |> as.data.frame()
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
# 1 1 1 1 1 2 2 2 1 9 9
# 2 1 1 2 2 2 9 9 1 1 9
# 3 1 9 1 1 2 9 2 1 2 2
# 4 9 9 2 2 2 2 1 1 2 2
# 5 9 9 1 1 2 2 2 2 2 9
Note: R >= 4.1 used.
Data:
set.seed(42)
replicate(5, paste(sample(c(1, 2, 9), 10, replace=TRUE), collapse='')) |>
as.matrix() |> writeLines('foo.txt')
CodePudding user response:
library(dplyr)
# nums <- read.table('nums.txt', colClasses = "character")
Say you have file structure like this:
nums <- structure(list(V1 = c("003111222", "212251256")),
class = "data.frame", row.names = c(NA, -2L))
Then you can iterate over all values and put substring values into separate columns which will form you a dataframe.
outs <- lapply(nums, function(x){
x <- unname(x)
rows <- lapply(1:nchar(x), function(i){
val <- substr(x, i, i)
data.frame(val)
}) %>%
bind_cols
colnames(rows) <- paste0('col',1:nchar(x))
rows
}) %>% bind_rows
print(outs)