Home > Mobile >  How to properly include data folder to python package
How to properly include data folder to python package

Time:03-30

I'm building a small python package that I deploy to our internal pypi server to be easily installable with pip. I'm using setup.py to build the tar.gz archive to upload there. And I need to include some additional data - to be more specific, I use nltk in my project and I want to ship the package with specific nltk data already downloaded, since it doesn't make sense to me to make the person using my package responsible for downloading it themself. So I have the following structure

├── setup.py
├── src
│   ├── __init__.py
│   ├── my_pkg
│   │   ├── __init__.py
│   │   ├── my_module.py
│   │   └── resources
│   │       └── nltk_data
|   |           └─... too many subfolders and files

And I would like to include the whole nltk_data subfolder to be in the same place once the package is installed. I managed to get working package_data={'my_pkg' :['./resources/file.dat']}, for one file, but I don't know how to do the same with complex directory structure with many subfolder, subsubfolders, file of different extensions etc. Is there any way to do this?

My setup.py is quite simple (I omitted things such as description or URL for simplicity)

from setuptools import setup, find_packages

setup(
    name='some-cool-name',
    version="1.0.0",
    classifiers=[],
    
    packages=find_packages(where='src'),
    package_dir={'': 'src'},
    package_data={'my_pkg' :[]},
    include_package_data=True,
    py_modules=[],

    python_requires='>=3.8',
    install_requires=['nltk==3.6.5']
)

CodePudding user response:

You can simply specify the relative path to the data you want to include. You need to put an __init__.py-file in both subfolders though, but then it should work.

package_data={'my_pkg' :['my_pkg/resources/nltk_data/*']}

To use the data in your script, use importlib (for example importlib.read_text) to open your desired file.

CodePudding user response:

A while after I posted this question, I encountered another solution, that was not mentioned here and seemed quite elegant, so I will put it here in case someone finds it useful. File MANIFEST.in in the top level of the directory structure, next to setup.py, can do the same easily, with

recursive-include src/my_pkg/resources *
  • Related