Home > other >  How to read .dta into Python
How to read .dta into Python

Time:03-21

I want to read data from http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta. I tried below,

import pandas as pd
import pyreadstat as pyreadstat

dataframe, meta = pyreadstat.read_dta("http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta")

With this I am getting below error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyreadstat/pyreadstat.pyx", line 260, in pyreadstat.pyreadstat.read_dta
  File "pyreadstat/_readstat_parser.pyx", line 1012, in pyreadstat._readstat_parser.run_conversion
pyreadstat._readstat_parser.PyreadstatError: File http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta does not exist!

I also tried using pandas, but failed

>>> Data = pd.read_stata("http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/pandas/io/stata.py", line 1898, in read_stata
    reader = StataReader(
  File "/usr/local/lib/python3.9/site-packages/pandas/io/stata.py", line 1066, in __init__
    self._read_header()
  File "/usr/local/lib/python3.9/site-packages/pandas/io/stata.py", line 1095, in _read_header
    self._read_old_header(first_char)
  File "/usr/local/lib/python3.9/site-packages/pandas/io/stata.py", line 1299, in _read_old_header
    raise ValueError(_version_error.format(version=self.format_version))
ValueError: Version of given Stata file is 110. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables).

However with R I could download this using data without any problem,

> head(read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta"))
  prate mrate totpart totelg age totemp sole  ltotemp
1  26.1  0.21    1653   6322   8   8709    0 9.072112
2 100.0  1.42     262    262   6    315    1 5.752573
3  97.6  0.91     166    170  10    275    1 5.616771
4 100.0  0.42     257    257   7    500    0 6.214608
5  82.5  0.53     591    716  28    933    1 6.838405
6 100.0  1.82      92     92   7    143    1 4.962845

Could you please help me to download this data with Python?

CodePudding user response:

import requests
import pyreadstat

url = 'http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta'

def download_file(url):
    local_filename = url.split('/')[-1]
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192): 
                f.write(chunk)
    return local_filename

# download_file(url)

df, meta = pyreadstat.read_dta(download_file(url))
  • Related