I'm trying to get a simple csv file from a url in Julia using Downloads and CSV without success. This is what I've done so far:
using Downloads, CSV
url = "https://r-data.pmagunia.com/system/files/datasets/dataset-85141.csv"
f = Downloads.download(url)
df = CSV.read(f, DataFrame)
But I get the following error: ArgumentError: Symbol name may not contain \0
I've tried using normalizenames, but also without success:
f = Downloads.download(url)
df = CSV.File(f, normalizenames=true)
But then I get Invalid UTF-8 string as an error message.
When I simply download the file and get it from my PC with CSV.read I get no errors.
CodePudding user response:
The server is serving that file with Content-Encoding: gzip
, i.e. the data that is transferred is compressed and the client is expected to decompress it. You can try this out yourself on the command line, curl does not decompress by default:
$ curl https://r-data.pmagunia.com/system/files/datasets/dataset-85141.csv [9:40:49]
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
however if you pass the --compressed
flag:
$ curl --compressed https://r-data.pmagunia.com/system/files/datasets/dataset-85141.csv
"time","Nile"
1871,1120
1872,1160
1873,963
[...]
Downloads.jl uses libcurl and I can't find much mention of handling of compressed content in the Downloads.jl repository.
To fix this for now you can upgrade to v0.9.4 of CSV.jl, it handles gzipped CSV-files transparently.
If updating is not an option you can use CodecZlib.jl manually:
using Downloads, CSV, DataFrames, CodecZlib
url = "https://r-data.pmagunia.com/system/files/datasets/dataset-85141.csv"
f = Downloads.download(url)
df = open(fh -> CSV.read(GzipDecompressorStream(fh), DataFrame), f)