Home > Software design >  How to import csv files without header and missing values in the first row
How to import csv files without header and missing values in the first row

Time:09-04

I'm attempting to read csv files that do not have a header. Each file has 11 columns. If data for a variable is missing for a time, it is left blank. If the first row has all of the observations (values in all 11 columns), it is simple to read all of these files. However, if data is absent in the first row, the file is not imported correctly.

I tried with following code and even with fread also.

temp = list.files(path = "/home/DATA", pattern = "4.*", full.names = TRUE)

df <- lapply(1:length(temp), function(i) read.table(temp[i], header = FALSE, fill = TRUE))

For instance, the standard file looks like this:

 1007.0     14   25.0   23.6     92  18.61      0      0  297.6  351.7  300.9
 1000.0     70   25.0   23.4     91  18.51      0      0  298.1  352.1  301.4
  946.0    557   24.4   18.4     69  14.29      6      2  302.3  344.8  304.9
  895.0   1041   21.6   17.9     79  14.64     12      5  304.2  348.1  306.9
  850.0   1486   20.0    3.0     32   5.62     18      7  307.1  324.6  308.1

However, if data is missing for several levels or full columns then it is like sample_data1

 1001.0     14
  968.1    304                                290     23
  934.6    609                                300     33
  850.0   1431   17.8                         260     23  304.8         304.8
  700.0   3069   10.6                         355     11  314.2         314.2

or sample_data2

 1004.0     64  -20.1                                     252.8         252.8
 1000.0     95   20.2                                     293.4         293.4
  950.0    533   21.8   15.8     69  12.02                299.3  334.6  301.5
  600.0   4333    1.0   -1.2     85   5.88                317.2  336.2  318.4

or some files are like sample_data3

 1000.0
  850.0
  700.0           8.3                                     311.6         311.6
  500.0

using above code sample_data1 is imported into 7 columns, sample_data2 is imported into 9 columns and sample_data3 is imported into 4 columns. How can I import all these csv files in 11 columns and NA for the missing values.

CodePudding user response:

You can do it using the readr library:

readr::read_csv(
  file,
  col_names = TRUE,
  col_types = NULL,
  col_select = NULL,
  id = NULL,
  locale = default_locale(),
  na = c("", "NA"),
  quoted_na = TRUE,
  quote = "\"",
  comment = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  name_repair = "unique",
  num_threads = readr_threads(),
  progress = show_progress(),
  show_col_types = should_show_types(),
  skip_empty_rows = TRUE,
  lazy = should_read_lazy()
)

This function has a na argument where you can specify which values should be replaced by NA. The default is an empty field ("").

CodePudding user response:

Did you try using read.csv instead of read.table? Use read.csv. This will replace blank rows and columns with NA's and if you want spaces in place of NA's then use colClasses="character".

temp = list.files(path = "/home/DATA", pattern = "4.*", full.names = TRUE)
df <- lapply(1:length(temp), function(i) read.csv(temp[i], header = FALSE))

#To read blanks as spaces
df <- lapply(1:length(temp), function(i) read.csv(temp[i], header = FALSE, colClasses = "character"))
  • Related