Home > front end >  Error in file(con, "r") in R reading a txt file using an URL
Error in file(con, "r") in R reading a txt file using an URL

Time:03-29

I am looking to read a .txt file from a URL. I run the following:

readLines(paste0("https://www.sec.gov/Archives/", All_file_today[Var], sep = ""))

Given that All_file_today[var] contains the following Url: 'edgar/data/99189/0001567619-22-004329.txt'

But it returns the error:

Error in file(con, "r") : 
  cannot open the connection to 'https://www.sec.gov/Archives/edgar/data/99189/0001567619-22-004329.txt'

When i copy this weblink and paste it in a web browser, it shows the content that I am looking for just clear. Anyone knows what i am not doing right please ?

Following the feedback from Nad below, I run the following:

> user <- paste('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7), AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36')
> res <- GET(url, add_headers(`User-Agent` = user, Connection = 'keep-alive'))
> res
Response [https://www.sec.gov/Archives/edgar/data/1000097/0000919574-15-002406.txt]
  Date: 2022-03-29 01:32
  Status: 200
  Content-Type: text/plain
  Size: 5.44 kB
<SEC-DOCUMENT>0000919574-15-002406.txt : 20150225
<SEC-HEADER>0000919574-15-002406.hdr.sgml : 20150225
<ACCEPTANCE-DATETIME>20150225160223
ACCESSION NUMBER:       0000919574-15-002406
CONFORMED SUBMISSION TYPE:  13F-HR/A
PUBLIC DOCUMENT COUNT:      2
CONFORMED PERIOD OF REPORT: 20141231
FILED AS OF DATE:       20150225
DATE AS OF CHANGE:      20150225
EFFECTIVENESS DATE:     20150225
...
> readLines(content(res))
No encoding supplied: defaulting to UTF-8.
Error in file(con, "r") : cannot open the connection

From the above, I understand that I am able to get to the file, but the readLines does not go through. What could be the reason please ?

CodePudding user response:

We can read the file using package httr,

url = 'https://www.sec.gov/Archives/edgar/data/99189/0001567619-22-004329.txt' 

user <- paste('Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0)',
            'Gecko/20100101 Firefox/98.0')

res <- GET(url, add_headers(`User-Agent` = user, Connection = 'keep-alive'))

readLines(content(res))
  •  Tags:  
  • r
  • Related