Im trying to dockerize a python aplication that runs on a conda environment and querys a postgresql database using sqlalchemy. Ive managed to run a dockerize my conda env and run jupyter on it, ive also managed to run a postgres image separately and ive built a docker-compose.yml with both services, but i cant get them to work together.
Im running the following code on a jupyter notebook with the conda env installed and im getting (psycopg2.OperationalError) could not connect to server: Connection refused.
import sqlalchemy
engine = sqlalchemy.create_engine('postgres://admin:admin@localhost:5432/postgresdb')
engine.connect()
Im quite new to docker, so im sure im getting some configuration wrong, any help would be great.
Configuration
- Docker compose:
version: "3"
services:
database:
container_name: postgres
image: 'bitnami/postgresql:12'
environment:
- POSTGRES_USER=admin
- POSTGRES_PASSWORD=admin
- POSTGRES_HOST=localhost
- POSTGRES_PORT=5432
- POSTGRES_DB=postgresdb
ports:
- "5432:5432"
volumes:
- db_volume:/var/lib/postgresql
jupyter-conda:
build: .
ports:
- "8888:8888"
depends_on:
- database
volumes:
db_volume:
- Dockerfile:
FROM ubuntu
SHELL [ "/bin/bash", "--login", "-c" ]
# Create a non-root user
ENV USER arquimedes
ENV UID 1000
ENV GID 1000
ENV HOME /home/$USER
RUN adduser --disabled-password \
--gecos "Non-root user" \
--uid $UID \
--home $HOME \
$USER
RUN apt-get update
RUN apt-get install wget -y
COPY environment.yml requirements.txt /tmp/
RUN chown $UID:$GID /tmp/environment.yml /tmp/requirements.txt
COPY postBuild /usr/local/bin/postBuild.sh
RUN chown $UID:$GID /usr/local/bin/postBuild.sh && \
chmod u x /usr/local/bin/postBuild.sh
COPY entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chown $UID:$GID /usr/local/bin/entrypoint.sh && \
chmod u x /usr/local/bin/entrypoint.sh
USER $USER
# install miniconda
ENV MINICONDA_VERSION latest
ENV CONDA_DIR $HOME/miniconda3
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-$MINICONDA_VERSION-Linux-x86_64.sh -O ~/miniconda.sh && \
chmod x ~/miniconda.sh && \
~/miniconda.sh -b -p $CONDA_DIR && \
rm ~/miniconda.sh
# make non-activate conda commands available
ENV PATH=$CONDA_DIR/bin:$PATH
# make conda activate command available from /bin/bash --login shells
RUN echo ". $CONDA_DIR/etc/profile.d/conda.sh" >> ~/.profile
# make conda activate command available from /bin/bash --interative shells
RUN conda init bash
# create a project directory inside user home
ENV PROJECT_DIR $HOME/conda-oracle
RUN mkdir $PROJECT_DIR
WORKDIR $PROJECT_DIR
# build the conda environment
ENV ENV_PREFIX $PROJECT_DIR/env
RUN conda update --name base --channel defaults conda && \
conda env create --prefix $ENV_PREFIX --file /tmp/environment.yml
# run the postBuild script to install any JupyterLab extensions
RUN conda activate $ENV_PREFIX && \
/usr/local/bin/postBuild.sh && \
conda deactivate
ENTRYPOINT [ "/usr/local/bin/entrypoint.sh" ]
# expose port the api
EXPOSE 8888
# password for jupyter
ENV JUPYTER_TOKEN eureka
# copy source files
COPY --chown=$UID:$GID /src $PROJECT_DIR/src
# inicializar el container con jupyter
CMD [ "jupyter", "lab", "--no-browser", "--ip", "0.0.0.0" ]
docker-container up output
EDIT: Part 2
So, ive been able to connect to the database now thanks to jabbsons answer, but im now having trouble interacting with the db. Im trying to pusblish a database from pandas and then query it, im able to pusblish but get error querying, anybody has any idea what im doing wrong? Thank you so much for your time! Code bellow.
import pandas as pd
from datetime import date
from typing import Dict, Optional, Any
from sqlalchemy import create_engine
from sqlalchemy.types import CHAR, INTEGER, DATE
CREDENTIALS = 'postgresql://admin:admin@database:5432/postgresdb'
def create_table_from_pandas(df: pd.DataFrame, table_name: str, schema: Dict[str, Any], chunksize: int = None) -> None:
conection = create_engine(CREDENTIALS)
# if_exists='replace' borra primero todo lo previo que haya en la tabla temporal
df.to_sql(
name=table_name,
con=conection,
if_exists="replace",
index=False,
dtype=schema,
chunksize=chunksize,
)
def execute_query(query: str, params: Optional[Dict[str, str]] = None) -> pd.DataFrame:
conection = create_engine(CREDENTIALS)
df_result = pd.read_sql(query, params=params, con=conection)
return df_result
df_test = pd.DataFrame({'COL1':[1,2,3,4,5], 'COL2':['a','b','c','d','e'], 'COL3':[date.today()]*5})
df_scheema = {
"COL1": INTEGER(),
"COL2": CHAR(20),
"COL3": DATE(),
}
create_table_from_pandas(df_test, 'DFTEST', df_scheema)
Everything fine untill this point, then I run:
execute_query("SELECT * FROM DFTEST")
And get: UndefinedTable: relation "dftest" does not exist
If I run
execute_query("SELECT * FROM information_schema.tables WHERE table_name = 'DFTEST'")
I can see the table screenchot pandas table
Again, thanks so much for your time, Im still pretty new to all this.
CodePudding user response:
The issue you are connecting to the localhost
from your jupyter notebook. If you want to connect to another service, use its name:
postgres://admin:admin@database:5432/postgresdb
For the second question, run
execute_query("SELECT * FROM \"DFTEST\"")
or
execute_query('SELECT * FROM "DFTEST"')
more details here
CodePudding user response:
jabbsons answer solved both problems, in addition, creating the table in lowwercase also solves the query problem.