Home > front end >  Open a text file containing a table with no clear separator
Open a text file containing a table with no clear separator

Time:02-08

I have to open a text file which contain data that I want to open as a tibble or dataframe. The data are stored in a tbale with white space as separator (I suppose). The probleme is that whitespace is also used in the column "PI". I have no idea how to open it without doing any complicated script to scan all of it and parse it with regex. Do you have any tips ? I think it was designed to have visualy aligned column when we open it as a text file.

here is a sample of the file.

  DAC                                                                PI    SPROF_DATE_UPDATE DATA_MODE      WMO CYCLE         PROFILE_DATE      LON     LAT        MPD_RAW      MPD_ADJ        Z_RAW       Z_ADJ
 aoml  STEPHEN RISER , KENNETH JOHNSON                                   2021-05-12 18:52:52         R  5903612   038  2012-07-04 01:15:50    8.651 -41.466     -89.313473          NaN     5.439052         NaN
 aoml  STEPHEN RISER , KENNETH JOHNSON                                   2020-12-03 02:03:33         R  5903717   138  2014-02-03 18:07:45 -164.533 -69.958     134.866767          NaN     7.884637         NaN
 aoml  STEPHEN RISER , KENNETH JOHNSON                                   2021-05-12 22:44:39         R  5903717   139  2014-02-10 20:05:49 -164.302 -70.061      99.848464          NaN     5.803396         NaN
 aoml  STEPHEN RISER , KENNETH JOHNSON                                   2021-05-12 22:44:53         R  5903717   140  2014-02-17 22:06:53 -164.146 -70.246     107.535752          NaN     6.260275         NaN

CodePudding user response:

Fixing @danlooo
First: header separator is single space.
Second: sometimes before - there is just a single space

data = readLines("testdata.txt") 
data[1] = gsub(pattern ="\\s " ,replacement = "\t" ,x = data[1])
modified_data = gsub(pattern ="\\s-" ,replacement = "  -" ,x = data)
modified_data = gsub(pattern ="\\s{2,}" ,replacement = "\t" ,x = modified_data)
final_data = paste0(modified_data,collapse = "\n")
write(final_data, "finaldata.txt")
finaldatatable <- read_delim("finaldata.txt", 
                        delim = "\t", escape_double = FALSE, 
                        trim_ws = TRUE)

CodePudding user response:

  1. Read the file as lines of text
  2. Replace multiple consecutive spaces with a custom and unique delimiter
  3. Paste everything into a character vector
  4. read from that data using the custom delimiter
  5. Additional re-formatting like trimming and type conversion
library(tidyverse)
read_lines("data.txt") %>%
  map_chr(~ .x %>% str_replace_all("(  ) ", ";")) %>%
  paste0(collapse = "\n") %>%
  read_delim(delim = ";", col_types = cols(.default = "character")) %>%
  mutate(across(everything(), str_trim)) %>%
  type_convert()
  •  Tags:  
  • Related