I have several .txt files that I would like to read and then rbind in R. I expect that each .txt file generate 1 line and 115 columns. First problem: I’m facing the following Warning message: “incomplete final line found by readTableHeader on…” But I have several files and I can’t navigate to the last line of each file and press Enter. Some solutions I found in the Internet didn’t work because of the following second problem.
Second problem: the column names (Columns) and the content of the columns (Rows) have no separator. The .txt files looks like this: "DIARREIA":1,"DISPNEIA":2, note that "DIARREIA" and "DISPNEIA" are column names while 1 and 2 are column contents. There are colon (:) between the name of the column and the content of the column.
Here is my code and 2 files as examples are available at https://drive.google.com/drive/folders/16U8J12Ld7PI5DI-ph_2QCysTxFGKZ-QP?usp=share_link.
````setwd("C:/User/BOX")
unzip("C:/User/BOX/data.zip")
list.files()
temp = list.files(pattern = "*.txt")
df = do.call("rbind", lapply(temp, function(x) read.table(x, stringsAsFactors = T, header = TRUE)))```
Any help, please? Thanks in advance!
CodePudding user response:
Hello Baptista: install jsonlite
if you dont installed it and try this:
# this line installs jsonlite
if(!("jsonlite" %in% installed.packages())) install.packages("jsonlite")
setwd("C:/User/BOX")
unzip("C:/User/BOX/data.zip")
temp <- list.files(pattern = "*.txt")
df <- do.call("rbind", lapply(temp, jsonlite::read_json))
CodePudding user response:
You've found yourself some Debian Control File medical records. ?read.dcf and the explanation of a properly formed .dcf file. You can get this result
subject1_2_4
subject PERDADEPALADAR1 PERDADEPALADAR ALTOFLUXOCATETERNASAL
1 1 false, false 1 false
2 2 NA 2 false
INSUFICINCIARENAL1 DATADEALTADAUTI DATADEADMISSOUTI
1 false
2 false 9\\/17\\/2020 12:00:00 AM 9\\/12\\/2020 12:00:00 AM
IMUNOMODULADORQUAIS DATADAALTA SITUAODOCASODESRAG DIARREIA
1 10\\/6\\/2020 12:00:00 AM 0 1
2 9\\/19\\/2020 12:00:00 AM 1 2
DESFECHODOPARTO CLOROQUINAHIDROXICLOROQUINA LINFOCITOPENIA1
1 -1 false false
2 -1 false false
OUTROSSINTOMASPERSISTENTES PO2 DISPNEIA OXIGENOTERAPIA
1 Ansiedade false 2 true
2 false 1 true
INSUFICINCIARESPIRATRIA PROFISSIONALDESADE TRIGLICRIDES FERRITINA1
1 0 2 false false
2 1 0 false false
DATAADMISSAO TOSSE1 DOENAHEMATOLGICACRNICA DDIMERO1 PARTO
1 9\\/24\\/2020 12:00:00 AM false false false 0
2 9\\/16\\/2020 12:00:00 AM false false true 0
COINFECOES SNDROMEDEDOWN PERDADEOLFATO DIABETESMELLITUS RENDAFAMILIAR
1 1 false 1 true
2 1 false 2 false
SATURAOO2 VENTILAOMECNICAINVASIVA DDIMERO
1 96 false false
2 96 false true
ANTIBITICOSQUAISETEMPODEUSO
1 Ceftriaxona 2g 24\\/24h 3d\nTazocin 4.5mg 6\\/6h 7d
2 Azitromicina 500mg 24\\/24h 5d\nCeftriaxona 1g 24\\/24h 7d
TRABALHODEPARTOPREMATURO VENTILAOMECNICAEMPOSIOPRONA OUTRASCAUSASDEADMISSOUTI
1 0 false
2 0 false
OUTRASSEQUELAS DATARESULTADOCONFIRMATRIOPARACOVID TOSSE DOENCAHEPTICACRNICA
1 8\\/1\\/2020 12:00:00 AM 2 false
2 9\\/17\\/2020 12:00:00 AM 1 false
PROTENACREATIVA1 ARTRALGIADORNASARTICULAES ENCAMINHAMETODEOUTROSERVIO ASMA
1 false false 2 false
2 false false 2 false
TRIMESTREDEGESTACAO PO21 INSUFICINCIARESPIRATRIA1 TIPODEPARTO OBESIDADE
1 false false -1 false
2 false true -1 false
FRAQUEZA OUTROS VOMITO DHLLDL1 IVERMECTINA
1 false Febre\ncoriza 1 false false
2 false Piora do quadro geral 2 false false
DIAGNSTICOCLNICOINICIAL ADMISSOUTI ALTOFLUXOMASCARA VITAMINAC FADIGA
1 Pneumonia e COVID 2 false false 2
2 Pneumonia e COVID 1 true false 2
PROTENACREATIVA VITAMINAD QUAISCOINFECES IMUNODEFICINCIA COCLHICINA
1 false false Pneumonia false false
2 false false Pneumonia bacteriana false false
ONDEFOIREALIZADOOPRIMEIROATENDIMENTODOPACIENTE
1 6
2 6
ANTICOAGULANTEQUAISETEMPODEUSO1 CONTATODE FALNCIADERGOS SEPSE PERDADEOLFATO1
1 Clexane 40mg 24\\/24h 12d 0 false 0 false
2 Clexane 40mg 24\\/24h 7d 1 false 0 false
INSUFICINCIARENAL EXPOSICAO DORABDOMINAL CHOQUE TCNAINTERNAO
1 0 -1 2 false 0
2 0 -1 2 false 2
DESCONFORTORESPIRATRIO DHLLDL ANTIVIRAISQUAISETEMPODEUSO NITAXOZANIDA
1 2 false false
2 2 false false
DATA SEPSE1 DOENANEUROLGICACRNICA ZINCO PACIENTEGESTANTE
1 8\\/27\\/2022 12:00:00 AM false false false 0
2 8\\/26\\/2022 12:00:00 AM false false false 0
OUTROSSINAISDEGRAVIDADE TIPODEEXAME DOENCACARDIOVASCULARCRNICA
1 0 true
2 0 false
PARALISIADEDOENTECRTICO DOENARENALCRNICA1 TEMPERATURA
1 false false 36\n9
2 false false 36\n5
FATORESDERISCOPARAGRAVIDADEEMGESTANTE INSUFICINCIACARDACA TRIGLICRIDES1
1 -1 false false
2 -1 false false
FALTADEAR AMNSIAESQUECIMENTO CORTICOIDESQUAISETEMPODEUSO LINFOCITOPENIA
1 false false Dexametasona 6mg 24\\/24h 10d false
2 false false Dexametasona 6mg 24\\/24h 7d false
OUTRAPNEUMOPATIACRNICA DORDEGARGANTA DESFECHOCLNICODOPACIENTE FIBROSEPULMONAR
1 false 2 1 false
2 false 2 1 false
BAIXOFLUXOCATETERNASAL RACA MIALGIADORNOCORPO DOENARENALCRNICA FERRITINA SEXO
1 true -1 false false false 0
2 true -1 false false false 0
PARADACARDIORRESPIRATRIA MIALGIA PURPERA ESPECTROCLNICOADMISSO TROMBOSE
1 false 2 false 1 false
2 false 2 false 1 false
ENDERECOTIPO
1 0
2 0
>
But there is a certain amount of mucking around to do, that can be done in R, likely easier in a text editor. With the .dcf rules in mind, we might (having already copied and pasted subject1 and subject2 into one text file)
subject1_2_step1 <- gsub('\\{', '', subject1_2)
subject1_2_step2 <- gsub('\\}', '', subject1_2)
subject1_2_step3 <- gsub(',', '\n', subject1_2)
subject1_2_step4_dcf <- read.dcf(textConnection(subject1_2_step3), all = TRUE)
Error in read.dcf(textConnection(subject1_2_step3), all = TRUE) :
Invalid DCF format.
Regular lines must have a tag.
Offending lines start with:
list(c("false
9\"
"false
5\"
))
It is easier to see in a text editor that these (9 and 5) are continuations of the prior tag:value pair, perhaps a clinician criticality indication, and should have a space before them. You could regex, find them and put the spaces, and in the end you still wouldn't have subject:1
, or subject:2
, as seen above because those aren't in the records, they're the file names. The same could likely be said for jsonlite
. And replaced all '"' with '' for easier column name reading.