I have a dataset of thousands of rows and only one column. It was supposed to be 108 columns. All the values are seperated by tabs and I want to re-write this data frame with seperate columns in R. An example of one row is "A_23_P149050\t-0.78007\t-0.43862\t0.26336\t-0.02076\t-0.11873\t0.30805\t-0.70170\t0.18403\t1.42516\t0.77827\t0.49341\t-0.07636\t0.00152\t0.55901"
It should be 15 different columns. With strsplit, I am getting a list and length still shows 1.
CodePudding user response:
df <- data.frame(a = "A_23_P149050\t-0.78007\t-0.43862\t0.26336\t-0.02076\t-0.11873\t0.30805\t-0.70170\t0.18403\t1.42516\t0.77827\t0.49341\t-0.07636\t0.00152\t0.55901")
library(dplyr)
library(tidyverse)
df1 <- df %>%
separate(col = a, into = paste("col", seq(1:15), sep= ""), sep = "\\t")
>df1
col1 col2 col3 col4 col5
1 A_23_P149050 -0.78007 -0.43862 0.26336 -0.02076
col6 col7 col8 col9 col10
1 -0.11873 0.30805 -0.70170 0.18403 1.42516
col11 col12 col13 col14 col15
1 0.77827 0.49341 -0.07636 0.00152 0.55901
CodePudding user response:
Using scan
.
scan(what='A', qui=T, text="A_23_P149050\t-0.78007\t-0.43862\t0.26336\t-0.02076\t-0.11873\t0.30805\t-0.70170\t0.18403\t1.42516\t0.77827\t0.49341\t-0.07636\t0.00152\t0.55901")
# [1] "A_23_P149050" "-0.78007" "-0.43862" "0.26336" "-0.02076" "-0.11873" "0.30805" "-0.70170"
# [9] "0.18403" "1.42516" "0.77827" "0.49341" "-0.07636" "0.00152" "0.55901"
CodePudding user response:
data.table
option:
df <- data.frame(V1 = c("A_23_P149050\t-0.78007\t-0.43862\t0.26336\t-0.02076\t-0.11873\t0.30805\t-0.70170\t0.18403\t1.42516\t0.77827\t0.49341\t-0.07636\t0.00152\t0.55901"))
library(data.table)
setDT(df)[, paste0("V", 1:15) := tstrsplit(V1, "\\t")]
df
#> V1 V2 V3 V4 V5 V6 V7 V8
#> 1: A_23_P149050 -0.78007 -0.43862 0.26336 -0.02076 -0.11873 0.30805 -0.70170
#> V9 V10 V11 V12 V13 V14 V15
#> 1: 0.18403 1.42516 0.77827 0.49341 -0.07636 0.00152 0.55901
Created on 2022-07-13 by the reprex package (v2.0.1)