I want to create a web interface for a folder structure on my server. Basically like dropbox or any other cloud storage but only with read functionality, no creation of new files or uploading.
I'm wondering what the best way would be to create a "virtual" representation of an existing folder structure on my server.
My idea was to recursively loop through the directory on the server and create a database entry for every file or folder. I'd create a hash of every file to uniquely identify them.
Like this:
'1382b6993e9f270cb1c29833be3f5750': {
type: 'folder',
name: 'root',
path: '/',
parentPath: null,
parentID: null,
children: ['147d0ef33fe657ce53a83de6a630473d']
},
'147d0ef33fe657ce53a83de6a630473d': {
type: 'folder',
name: 'pictures',
parentID: '1382b6993e9f270cb1c29833be3f5750',
parentPath: '/',
path: '/pictures',
children: ['8f7c5959dbb088c0aef8b145dbdf6e43']
},
'8f7c5959dbb088c0aef8b145dbdf6e43': {
type: 'file',
name: 'cat.jpg',
parentID: '147d0ef33fe657ce53a83de6a630473d',
parentPath: '/pictures',
path: '/pictures/cat.jpg'
},
To account for changes in the directory I'd periodically run a process to scan it and update the database accordingly.
It should be said that the directory is rather big with many subfolders and hundreds of files.
Since the whole folder structure is rather big I could see it being a problem to keep the whole tree in the react state, but I guess that could be fixed by always fetching the directory contents when navigating on the frontend.
Does this approach make sense or is there a better way?
CodePudding user response:
You can either optimize the disk usage or the CPU usage based on what is more convenient for you. It is just a tradeoff between the two in the end.
If you want to maximize your CPU load and know that disk space is cheap, you can store the content of your entire filesystem's tree representation like you did and make periodic updates. The main pro is that you will have a very little CPU footprint and if your app is used by many users, your server won't be overloaded by API requests. The main cons is that is takes a lot of CPU and disk usage once in a while when you need to periodically update your filesystem's tree and you may have synchronization issues (when a file has been created but not stored in the database).
In the database aspect of things, a high-readonly purpose database should be constructed using NoSQL database instead of SQL database when there is little to no relations between data and/or you want fast read access to them in a large scale.
If you want to stick to SQL databases, you should be using a foreign key for storing the parent folder such as when creating a database entry for a file, you can do something similar.
INSERT INTO files(path, type, parent) VALUES("/home/johndoe", "FOLDER", 987126398726387)
Where 987126398726387
would be the id
column of a files
row related to the /home
folder for instance. That way your reconciliation folder's algorithm will be very simple since you will only make a single request to retrieve all files for a folder.
SELECT * FROM files WHERE parent = 987126398726387;
And you can use all the power of your SQL to order, group and do the stuff that you might want to do there.
Instead of loading your disk usage, you can trade disk usage for more CPU usage by completely removing your database and having a single API endpoint that looks like https://api.com/folders/:tree
, where tree
is the folder's tree requested by the client.
For instance, if I start the application, I'll probably want to request the https://api.com/folders/
endpoint, and you can send only the content of that folder without its subfolder, thus removing the need for a recursive algorithm. Then, if I need to go in the /home
folder, I can call the https://api.com/folders/home
endpoint. Then, I'll go in the /home/johndoe
folder with https://api.com/folders/home/johndoe
, and in the /home/johndoe/code
folder with https://api.com/folders/home/johndoe/code
, etc...
You can even pass the requested path in the body of your asynchronous HTTP request from the client instead.
GET /folders HTTP/2
Host: api.com
Content-Type: application/json
{
"path": "/home/johndoe/code"
}
You can display a loader while you are updating the current folder on the client side while the API respond. This will obviously make much more API calls but you save a lot of disk usage by doing that. And if you need speed or less CPU usage, you can use a compiled language like V, Go or Rust for that and make a microservice that the sole purpose is to get the content of a folder fast.