I have number of Python tools that I run, mostly for scraping websites or pulling data from various web APIs.
Currently they are all run from bash with the parameters passed as command line arguments e.g.
$ python my_tool.py -arg1 -arg2 --output foobar.json
I want to move away from using the command line and instead run the tools via a web interface. My aim is to have a setup where a user can authenticate to the website, enter the arguments via a web form, click to run the tools, and have the returned json data to a database for later analysis.
I've chosen Django as a framework for the web interface and db since I already have experience with Python. However I'm seeking advice as to best practice for integrating the Python tools I already have with Django. Am I best to integrate the tools directly as Django apps, or keep them separate but with a means to communicate with Django? I'd be grateful for the advice and experience of others on this one.
CodePudding user response:
I think that you definitely must create a package containing all your scripts. If you should include or exclude from the app, it deppends. If your scripts are intended to be used all accross the project, the package has to be outside the app otherwise inside.
Inside the app
├── scrape
│ ├── db_tools
│ │ ├── __init__.py
│ │ ├── scrape_etc.py
│ ├── views.py
│ └── forms.py
Outside the app
├── db_tools
│ ├── __init__.py
│ ├── scrape_etc.py
├── some_app
└── other_app
As an additional note, in case the output be time consuming you could create background tasks (using celery) and send the result by email or store someplace.
CodePudding user response:
maintenance of API and its code
Good news, you have lots of options!
You used to invoke my_tool.py
from bash.
Perhaps it starts with import typer
.
Definitely it is working now,
and is well tested.
In future most invocations (all invocations?) will be via web.
Bad news, the bits will rot as months go by. The tool will sprout new features, and acquire bug fixes. An interface that is not regularly tested is one that will eventually behave less predictably than desired.
Here's the decision you are now faced with making:
How will folks call the tool's code in future?
That is, what is the supported way to call it? There are some choices available.
1. keep the CLI interface
The CLI API is already supported and working. You might even have automated tests that exercise it, verifying all is well. Perhaps there are multiple callers, such as production users at bash prompt, developers at bash prompt, Makefiles, cron jobs. Web becomes just one more caller.
You might choose to preserve this status quo, and additionally support having a web server fork / exec the tool as a child process. At least one advantage is that, in the event of resource leak, the tool is a short-lived process which releases resources upon exiting. A disadvantage is the need to re-import python libraries, which in some cases can take more than one second.
2. support both
As months go by there will be new features added, and some of them might tack on new args to the existing API.
Commit to supporting both CLI & web as first-class consumers of the code. Take care to add automated tests for both, and run them with each release.
This suggests that django will call into public methods within the same process. That should be faster, as imports and datastructures are already initialized. We hope there's no resource leaks.
3. web only
Abandon / desupport the CLI interface, to save support costs.
Note that you will still need automated tests, to instill confidence in new releases down the road. You might find that CLI support would come in handy during testing phase.
In the end, the decision is yours. Life is full of trade-offs.