I'm attempting to read csv files that do not have a header. Each file has 11 columns. If data for a variable is missing for a time, it is left blank. If the first row has all of the observations (values in all 11 columns), it is simple to read all of these files. However, if data is absent in the first row, the file is not imported correctly.
I tried with following code and even with fread
also.
temp = list.files(path = "/home/DATA", pattern = "4.*", full.names = TRUE)
df <- lapply(1:length(temp), function(i) read.table(temp[i], header = FALSE, fill = TRUE))
For instance, the standard file looks like this:
1007.0 14 25.0 23.6 92 18.61 0 0 297.6 351.7 300.9
1000.0 70 25.0 23.4 91 18.51 0 0 298.1 352.1 301.4
946.0 557 24.4 18.4 69 14.29 6 2 302.3 344.8 304.9
895.0 1041 21.6 17.9 79 14.64 12 5 304.2 348.1 306.9
850.0 1486 20.0 3.0 32 5.62 18 7 307.1 324.6 308.1
However, if data is missing for several levels or full columns then it is like sample_data1
1001.0 14
968.1 304 290 23
934.6 609 300 33
850.0 1431 17.8 260 23 304.8 304.8
700.0 3069 10.6 355 11 314.2 314.2
or sample_data2
1004.0 64 -20.1 252.8 252.8
1000.0 95 20.2 293.4 293.4
950.0 533 21.8 15.8 69 12.02 299.3 334.6 301.5
600.0 4333 1.0 -1.2 85 5.88 317.2 336.2 318.4
or some files are like sample_data3
1000.0
850.0
700.0 8.3 311.6 311.6
500.0
using above code sample_data1 is imported into 7 columns, sample_data2 is imported into 9 columns and sample_data3 is imported into 4 columns.
How can I import all these csv files in 11 columns and NA
for the missing values.
CodePudding user response:
You can do it using the readr
library:
readr::read_csv(
file,
col_names = TRUE,
col_types = NULL,
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
name_repair = "unique",
num_threads = readr_threads(),
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)
This function has a na
argument where you can specify which values should be replaced by NA
. The default is an empty field ("").
CodePudding user response:
Did you try using read.csv
instead of read.table
?
Use read.csv
. This will replace blank rows and columns with NA
's and if you want spaces in place of NA
's then use colClasses="character"
.
temp = list.files(path = "/home/DATA", pattern = "4.*", full.names = TRUE)
df <- lapply(1:length(temp), function(i) read.csv(temp[i], header = FALSE))
#To read blanks as spaces
df <- lapply(1:length(temp), function(i) read.csv(temp[i], header = FALSE, colClasses = "character"))