I want to upload a data file that contains multiple header rows and whitespaces into R. I have copied the data here
# File name
#
#@ 1 "Some text" "aa"
#@ 2 "Some text" "bb"
#@ 3 "Some text" "cc"
#@ 4 "Some text" "dd"
#@ 5 "Some text" "ee"
#@ 6 "Some text" "ff"
#
#
#
#
0.000000e 00 0.000000e 00 0.000000e 00 0.000000e 00 0.000000e 00 0.000000e 00
1.000000e-03 3.727051e-04 2.532203e-04 4.736003e-04 3.727051e-07 0.000000e 00
2.000000e-03 2.266785e-03 1.540081e-03 2.880429e-03 2.639490e-06 0.000000e 00
3.000000e-03 7.538553e-03 5.121786e-03 9.579321e-03 1.017804e-05 0.000000e 00
4.000000e-03 1.838835e-02 1.249329e-02 2.336627e-02 2.856639e-05 0.000000e 00
5.000000e-03 3.703296e-02 2.516073e-02 4.705817e-02 6.559935e-05 0.000000e 00
6.000000e-03 2.266785e-03 1.540081e-03 2.880429e-03 2.639490e-06 0.000000e 00
7.000000e-03 7.538553e-03 5.121786e-03 9.579321e-03 1.017804e-05 0.000000e 00
8.000000e-03 1.838835e-02 1.249329e-02 2.336627e-02 2.856639e-05 0.000000e 00
9.000000e-03 3.703296e-02 2.516073e-02 4.705817e-02 6.559935e-05 0.000000e 00
As you can see, it contains 9 rows and 6 columns. The issue is there is bunch of header data on top of the actual data set that contains the column names. I tried using the read_table command and whitespace delimiter, but the columns were limited to 3, instead of 6
dat <- read_table("example.txt", col_names = F)
I would like to import the data, convert to numeric and label each column with the header names listed in the file. So name of the first column would be "Some text aa", 2nd column would be "Some text bb" and so on. Is there a way to get this? I have 100 such files and would like to automate the process
CodePudding user response:
If the file is quux.txt
, then
dat <- read.table("quux.txt", skip = 13, header = FALSE)
dat
# V1 V2 V3 V4 V5 V6
# 1 0.000 0.0000000000 0.0000000000 0.0000000000 0.000000e 00 0
# 2 0.001 0.0003727051 0.0002532203 0.0004736003 3.727051e-07 0
# 3 0.002 0.0022667850 0.0015400810 0.0028804290 2.639490e-06 0
# 4 0.003 0.0075385530 0.0051217860 0.0095793210 1.017804e-05 0
# 5 0.004 0.0183883500 0.0124932900 0.0233662700 2.856639e-05 0
# 6 0.005 0.0370329600 0.0251607300 0.0470581700 6.559935e-05 0
From here, you can extract the column names with
nms <- readLines(pipe("grep '^#@' quux.txt")) |>
sub(pattern = '.*"(\\S )"\\s*$', replacement = '\\1')
nms
# [1] "aa" "bb" "cc" "dd" "ee" "ff"
names(dat) <- nms
dat
# aa bb cc dd ee ff
# 1 0.000 0.0000000000 0.0000000000 0.0000000000 0.000000e 00 0
# 2 0.001 0.0003727051 0.0002532203 0.0004736003 3.727051e-07 0
# 3 0.002 0.0022667850 0.0015400810 0.0028804290 2.639490e-06 0
# 4 0.003 0.0075385530 0.0051217860 0.0095793210 1.017804e-05 0
# 5 0.004 0.0183883500 0.0124932900 0.0233662700 2.856639e-05 0
# 6 0.005 0.0370329600 0.0251607300 0.0470581700 6.559935e-05 0