Home > Back-end >  How are user files and log files stored when using Docker? Any sample GitHub project available?
How are user files and log files stored when using Docker? Any sample GitHub project available?

Time:09-10

The application:
I'm building a statistics app which allows users to login, upload data files, and do some statistics calculations with the data. The plots generated would be displayed to them on the browser and the plots would need to be saved to disk too (to avoid having to re-generate them the next time the user wants to view it).

Files anticipated:

  1. The data files the user uploads (xls or csv files).
  2. The generated plots (png files).
  3. User preferences/settings of how they use the app.
  4. Log files of actions taken by the user, errors encountered, etc. (logger files generated by Python, Julia or R)

My initial assumptions:

  1. and 2. I've seen volumes, but how will the user files be stored there? Do I just create a folder for a user and store all their data there? Any security issues you see?
  2. I assume these would get stored in an SQL or no-SQL database).
  3. Aren't logs normally sent to a separate Docker container that's specifically meant for storing logs? So rather than use a standard logger, would it be better to send the log message to a container or database, using RabbitMQ?

Would there be any sample project on GitHub or GitLab that I could refer?
If Docker wasn't used; if the app was just deployed on a server, the user files would simply be stored in folders on the server, correct? I believe it is "frowned upon" to store files as BLOBs in a database.

CodePudding user response:

If at all possible, you should store all of the data in a database. If neither the input files nor the rendered charts will be too large (their size can be reasonably measured in kilobytes, say) then you could store it in a binary-object column in the database. If you can do this, then your application will not need any volumes or other persistent storage (the database will) and this will make it much easier to scale and update your application.

If you're running this in a cloud environment, using a hosted storage system like AWS S3 also makes sense, checks the same boxes, and avoids the minor ugliness of storing unstructured binary data in a structured data store.

Volumes are potentially an option, but become tricky. You tagged this questions with both and . Both have ways to allocate storage and mount them into containers. As far as your application code is concerned, the mounted volume is just a filesystem path and it can read and write files there normally. There are potential problems with filesystem permissions in both environments, and if you run multiple copies of your application (especially easy in Kubernetes) there are risks of the replicas trying to access the same files at the same time. The storage types that are easier to get in Kubernetes can't be used on multiple nodes at the same time, which limits the utility of trying to share a single volume.

(More specifically in Kubernetes, if you must use local storage, I'd recommend a StatefulSet to manage this. This will automatically create the storage and attach it to the Pods if correctly configured. You probably shouldn't manually create a PersistentVolumeClaim; you very likely shouldn't manually create a PersistentVolume; you almost definitely shouldn't use hostPath: type storage.)


For logs, you should configure your application to write logs normally to its stdout. docker logs or kubectl logs will be able to retrieve the logs, and most log-management systems can be straightforwardly configured to collect the Docker or Kubernetes container logs. Even if you're running your log collector in a container (or Kubernetes DaemonSet) it will often have access to the host's or node's log directory. You can also configure plain Docker to send the logs somewhere else, but only if the application sends the logs to stdout.

If the per-action log messages need to be visible to the end user then you'll somehow need to collect them within the application. This isn't specific to the container technology, you'd have to do this even if you were running the application directly on a server.

  • Related