I have an enormous text file that contains one long string that I'm trying to import into R as a data frame.
The text file containing the data is from
https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/new.data
Essentially the text file is a long string with values separated by spaces and I was wondering if I could convert all the spaces into commas using R so that I could then use read_csv?
I tried to import it as a tsv but it didn't work. When I tried to use read_delim it also didn't work because of how the text file is formatted.
Does anyone have any leads on how I could import this text file into a simple data frame? The first two rows from the text file are indicated below with the first row bolded to discern it from the second row.
1 15943882 63 1 -9 -9 -9 -27 1 145 1 233 -9 50 20 1 0 1 2 2 3 1981 0 0 0 0 0 1 10.5 6 13 150 60 190 90 145 85 0 0 2.3 3 -9 -9 0 -9 -9 -9 -9 -9 -9 6 -9 -9 -9 2 16 1981 0 1 1 1 -9 1 -9 1 -9 1 1 1 1 1 1 1 -9 -9 0 -9 -9 -9 -9 -9 -9 -9 -9 -9 0 0 0 0 name 2 15964847 67 1 -9 -9 -9 -27 4 160 1 286 -9 40 40 0 0 1 2 3 5 1981 0 1 0 0 0 1 9.5 6 13 108 64 160 90 160 90 1 0 1.5 2 -9 -9 3 -9 -9 -9 -9 -9 -9 3 -9 -9 -9 2 5 1981 2 1 2 2 -9 2 -9 1 -9 1 1 1 1 1 1 1 -9 -9 0 -9 -9 -9 -9 -9 -9 -9 -9 -9 0 0 0 0 name
Thank You!
CodePudding user response:
There's probably a much more efficient way, but this seems to do the trick. You'll have to name the columns yourself (the data doesn't seem to have column names).
library(dplyr)
library(tibble)
library(readr)
# Determined by looking at the file. Not sure if
# there's a way to determine this automatically
line_per_chunk <- 12L
# Read the whole file into a characater vector
data <- read_lines('new.data')
# Combine every group of 12 lines into a single string
# (using a space as a delimiter to match the rest of the file)
joined_data <- data %>%
# Make the character vector a data frame, with a row number column
enframe(name = 'row', value = 'raw_data') %>%
# Based on https://stackoverflow.com/a/66732944/1714
group_by(chunk = (row -1) %/% line_per_chunk) %>%
summarise(joined = paste(raw_data, collapse = ' '))
# Based on https://stackoverflow.com/a/8464885/1714
results <- read.table(textConnection(joined_data[["joined"]]), sep = ' ')
results