I have a process goodreads-user-scraper that runs fine within a cron scheduler script that I run from my Ubuntu terminal.
From my Ubuntu server terminal, I navigate to the directory containing scheduler.py and write:
python scheduler.py
This runs fine. It scrapes the site and saves files to the output_dir I have assigned inside the script.
Now, I want to run this function using a service file (socialAggregator.service).
When I set up a service file in my Ubuntu server to run scheduler.py, goodreads-user-scraper is not recognized. It's the exact same file I just ran from the terminal.
Why is goodreads-user-scraper not found when the service file calls the script?
Any ideas?
Error message form syslog file
Jan 12 22:13:15 speedypersonal2 python[2668]: --user_id: 1: goodreads-user-scraper: not found
socialAggregator.service
[Unit]
Description=Run Social Aggregator scheduler - collect data from API's and store in socialAggregator Db --- DEVELOPMENT ---
After=network.target
[Service]
User=nick
ExecStart= /home/nick/environments/social_agg/bin/python /home/nick/applications/socialAggregator/scheduler.py --serve-in-foreground
[Install]
WantedBy=multi-user.target
scheduler.py
from apscheduler.schedulers.background import BackgroundScheduler
import json
import requests
from datetime import datetime, timedelta
import os
from sa_config import ConfigLocal, ConfigDev, ConfigProd
import logging
from logging.handlers import RotatingFileHandler
import subprocess
if os.environ.get('CONFIG_TYPE')=='local':
config = ConfigLocal()
elif os.environ.get('CONFIG_TYPE')=='dev':
config = ConfigDev()
elif os.environ.get('CONFIG_TYPE')=='prod':
config = ConfigProd()
#Setting up Logger
formatter = logging.Formatter('%(asctime)s:%(name)s:%(message)s')
formatter_terminal = logging.Formatter('%(asctime)s:%(filename)s:%(name)s:%(message)s')
#initialize a logger
logger_init = logging.getLogger(__name__)
logger_init.setLevel(logging.DEBUG)
#where do we store logging information
file_handler = RotatingFileHandler(os.path.join(config.PROJ_ROOT_PATH,'social_agg_schduler.log'), mode='a', maxBytes=5*1024*1024,backupCount=2)
file_handler.setFormatter(formatter)
#where the stream_handler will print
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter_terminal)
logger_init.addHandler(file_handler)
logger_init.addHandler(stream_handler)
def scheduler_funct():
logger_init.info(f"- Started Scheduler on {datetime.today().strftime('%Y-%m-%d %H:%M')}-")
scheduler = BackgroundScheduler()
job_collect_socials = scheduler.add_job(run_goodreads,'cron', hour='*', minute='13', second='15')#Testing
scheduler.start()
while True:
pass
def run_goodreads():
logger_init.info(f"- START run_goodreads() -")
output_dir = os.path.join(config.PROJ_DB_PATH)
goodreads_process = subprocess.Popen(['goodreads-user-scraper', '--user_id', config.GOODREADS_ID,'--output_dir', output_dir], shell=True, stdout=subprocess.PIPE)
logger_init.info(f"- send subprocess now on::: goodreads_process.communicate() -")
_, _ = goodreads_process.communicate()
logger_init.info(f"- FINISH run_goodreads() -")
if __name__ == '__main__':
scheduler_funct()
CodePudding user response:
The problem was the environment which the service file was using was not the same as the environment used when I run the script in the terminal.
Below is the service file that now works.
[Unit]
Description=Run Social Aggregator scheduler - collect data from API's and store in socialAggregator Db --- DEVELOPMENT ---
After=network.target
[Service]
User=nick
ExecStart= /home/nick/environments/social_agg/bin/python /home/nick/applications/socialAggregator/scheduler.py --serve-in-foreground
Environment=PATH=/home/nick/environments/social_agg/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
[Install]
WantedBy=multi-user.target
I added Environment=PATH=<path_from_terminal_and_venv_activated>
For additional clarity, path_from_terminal_and_venv_activated is obtained by:
- Activiating my python venv in the terminal
- copying the result of echo $PATH