Home > Net >  Import data in R from a text file containing multi header rows and whitespaces
Import data in R from a text file containing multi header rows and whitespaces

Time:01-06

I want to upload a data file that contains multiple header rows and whitespaces into R. I have copied the data here

# File name
#
#@   1  "Some text"                                                   "aa"
#@   2  "Some text"                                                   "bb"
#@   3  "Some text"                                                   "cc"
#@   4  "Some text"                                                   "dd"
#@   5  "Some text"                                                   "ee"
#@   6  "Some text"                                                   "ff"

#
#
#
#

 0.000000e 00  0.000000e 00  0.000000e 00  0.000000e 00  0.000000e 00  0.000000e 00  
 1.000000e-03  3.727051e-04  2.532203e-04  4.736003e-04  3.727051e-07  0.000000e 00   
 2.000000e-03  2.266785e-03  1.540081e-03  2.880429e-03  2.639490e-06  0.000000e 00  
 3.000000e-03  7.538553e-03  5.121786e-03  9.579321e-03  1.017804e-05  0.000000e 00   
 4.000000e-03  1.838835e-02  1.249329e-02  2.336627e-02  2.856639e-05  0.000000e 00  
 5.000000e-03  3.703296e-02  2.516073e-02  4.705817e-02  6.559935e-05  0.000000e 00 
 6.000000e-03  2.266785e-03  1.540081e-03  2.880429e-03  2.639490e-06  0.000000e 00  
 7.000000e-03  7.538553e-03  5.121786e-03  9.579321e-03  1.017804e-05  0.000000e 00   
 8.000000e-03  1.838835e-02  1.249329e-02  2.336627e-02  2.856639e-05  0.000000e 00  
 9.000000e-03  3.703296e-02  2.516073e-02  4.705817e-02  6.559935e-05  0.000000e 00

As you can see, it contains 9 rows and 6 columns. The issue is there is bunch of header data on top of the actual data set that contains the column names. I tried using the read_table command and whitespace delimiter, but the columns were limited to 3, instead of 6

dat <- read_table("example.txt", col_names = F)

I would like to import the data, convert to numeric and label each column with the header names listed in the file. So name of the first column would be "Some text aa", 2nd column would be "Some text bb" and so on. Is there a way to get this? I have 100 such files and would like to automate the process

CodePudding user response:

If the file is quux.txt, then

dat <- read.table("quux.txt", skip = 13, header = FALSE)
dat
#      V1           V2           V3           V4           V5 V6
# 1 0.000 0.0000000000 0.0000000000 0.0000000000 0.000000e 00  0
# 2 0.001 0.0003727051 0.0002532203 0.0004736003 3.727051e-07  0
# 3 0.002 0.0022667850 0.0015400810 0.0028804290 2.639490e-06  0
# 4 0.003 0.0075385530 0.0051217860 0.0095793210 1.017804e-05  0
# 5 0.004 0.0183883500 0.0124932900 0.0233662700 2.856639e-05  0
# 6 0.005 0.0370329600 0.0251607300 0.0470581700 6.559935e-05  0

From here, you can extract the column names with

nms <- readLines(pipe("grep '^#@' quux.txt")) |>
  sub(pattern = '.*"(\\S )"\\s*$', replacement = '\\1')
nms
# [1] "aa" "bb" "cc" "dd" "ee" "ff"
names(dat) <- nms
dat
#      aa           bb           cc           dd           ee ff
# 1 0.000 0.0000000000 0.0000000000 0.0000000000 0.000000e 00  0
# 2 0.001 0.0003727051 0.0002532203 0.0004736003 3.727051e-07  0
# 3 0.002 0.0022667850 0.0015400810 0.0028804290 2.639490e-06  0
# 4 0.003 0.0075385530 0.0051217860 0.0095793210 1.017804e-05  0
# 5 0.004 0.0183883500 0.0124932900 0.0233662700 2.856639e-05  0
# 6 0.005 0.0370329600 0.0251607300 0.0470581700 6.559935e-05  0
  • Related