I want to read data from http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta. I tried below,
import pandas as pd
import pyreadstat as pyreadstat
dataframe, meta = pyreadstat.read_dta("http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta")
With this I am getting below error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyreadstat/pyreadstat.pyx", line 260, in pyreadstat.pyreadstat.read_dta
File "pyreadstat/_readstat_parser.pyx", line 1012, in pyreadstat._readstat_parser.run_conversion
pyreadstat._readstat_parser.PyreadstatError: File http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta does not exist!
I also tried using pandas
, but failed
>>> Data = pd.read_stata("http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/pandas/io/stata.py", line 1898, in read_stata
reader = StataReader(
File "/usr/local/lib/python3.9/site-packages/pandas/io/stata.py", line 1066, in __init__
self._read_header()
File "/usr/local/lib/python3.9/site-packages/pandas/io/stata.py", line 1095, in _read_header
self._read_old_header(first_char)
File "/usr/local/lib/python3.9/site-packages/pandas/io/stata.py", line 1299, in _read_old_header
raise ValueError(_version_error.format(version=self.format_version))
ValueError: Version of given Stata file is 110. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables).
However with R
I could download this using data without any problem,
> head(read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta"))
prate mrate totpart totelg age totemp sole ltotemp
1 26.1 0.21 1653 6322 8 8709 0 9.072112
2 100.0 1.42 262 262 6 315 1 5.752573
3 97.6 0.91 166 170 10 275 1 5.616771
4 100.0 0.42 257 257 7 500 0 6.214608
5 82.5 0.53 591 716 28 933 1 6.838405
6 100.0 1.82 92 92 7 143 1 4.962845
Could you please help me to download this data with Python
?
CodePudding user response:
import requests
import pyreadstat
url = 'http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta'
def download_file(url):
local_filename = url.split('/')[-1]
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
return local_filename
# download_file(url)
df, meta = pyreadstat.read_dta(download_file(url))