As far as I know, If I want to create a new database, normally fistly I have to create and switch on a server (in this case a local server in my computer), Secondly I connect my RDBMS (MySql) to my local server, and Thirdly I use SQL to create it.
I was wondering firstly if is it possible to create a database just using a hardrive disk without creating any server.
Secondly If the answer is "Yes", then which are the advantages of using a server?
I mean, This is my logic: If my local server use my the hardware of my pc (and therefore it has to use my hard drive disk to store information) why do I have to use it instead of just storing the info in my hard drive disk directly?
What are the functions of the local server that my computer isn't capable of doing without creating it?
I must admit the fact that I'm not sure up to what point does my question make sense, since I may not know what a server is in depth and what are all the possibilities that this concept brings.
I've tryied looking for info on internet. Indeed I'm using ChatGPT to help me realized about any variables I don't know (Although it sometimes gives wrong information, so I try to contrast it).
The truth is that "I don't know what I don't know", therefore I thought It would be better to ask someone to clarify my doubts and fill my gaps.
CodePudding user response:
SQLite is a SQL database system that doesn't use a separate database server program. It is a code library that simply accesses a file from your program. It's efficient, lightweight, and portable. It's great if you have a program that needs SQL but doesn't need shared data. Many mobile phone apps use local SQLite databases internally; you probably have several on your phone.
Most SQL database systems (MySQL, MariaDB, postgreSQL, Oracle, Microsoft SQL Server, IBM DB2, that lot) use a separate server because their key design objective is to allow many different users on many different computers to share the same data. A reservation system for an airline is the classic example: If I reserve seat 12C, that seat must then be unavailable for you to reserve. Client-server database systems handle the concurrent access and transaction control needed to do that kind of data sharing.
You're quite right that a relational database server that you can only connect to with localhost
is, from a network architecture point of view, overkill. But these localhost
database servers are often used for developing software. When the software is completed and deployed, it connects to a shared server.
Each make of SQL database has its own SQL dialect. Many widely-deployed applications (WordPress, Stack Overflow, etc) chose one SQL dialect many years ago, and so require that kind of database. You'll run into this if you create a program for SQLite, then have to adapt it to use, I dunno, MySQL or Oracle. The big expensive SQL databases call this phenomenon "customer lock-in".
Finally, production data typically has a far longer lifetime than the programs that use it. I've worked on code for SQL databases that contain tables created 20 years ago, back when a terabyte of hard-drive storage cost a small fortune. The client-server model makes that possible without the need to restructure large quantities of data for every new software release.
CodePudding user response:
I think I understand the intent behind "just the hard disk" that you mentioned and I will answer the intent behind that question momentarily. But there are some technical reasons why the way you phrased this makes your question invalid. Mainly because any database that persists data is storing that data in long term storage (a.k.a. the "disk"), Mysql, MSSQL, MongoDB, Excel, etc all eventually write their data to the disk. In addition to this, another reason why it's hard to answer your question directly is because some sort of program is required to interface with that data anyhow, and on different levels. MySQL has it's server software, it's client cli, and it's various client connectors. MSSQL has its server, it's Sql Server Managment Studio client, various connectors and third party tools. Excel has... Excel, or Whatever openoffice alternative there is out there. So there has to be something that allows you to interact with this data and make it available to the various endpoints that need it.
The reason why I think I understand the intent behind your question is the focus on "server" and your questions about the benefits of it vs direct hard drive writes. So here are a few alternatives to a legitimate database server and their comparisons which might make sense in your Scenario.
CSV Files
CSV files, or comma delimited files, are essentially text files stored directly on disk. In your application software you can access these files using the typical file streams. CSV files are widely used and as a result each language and software ecosystem have their own libraries or techniques of interfacing with CSV files. CSV files can also be manipulated with something as simple as a basic text editor such as Notepad. Many worksheet software such as Microsoft Excel, OpenOffice, or Google Sheets are able to work with and manipulate these CSV files, however without special care taken with these tools they can corrupt the files. The drawback to using CSV data is that some data formats are difficult to work with, it is hard to maintain relational data, or multiple tables, and you will never get the same IO performance from a home-grown CSV implementation as you would a database. In addition the data will only be available to the machine which the data resides without filesharing being implemented, which again you will never match the networking performance of a proper database which has decades of development behind it.
Worksheets
Similar to CSV files with the same benefits and drawbacks. However they use a special format, usually proprietary, and are difficult if not impossible to modify using notepad. They also will usually require some sort of external library to interface with them using custom code.
SQLite
Note: This option is probably what you are really after. That is because SQLite is kind of like a database, but without the networking component. You get all the same caching, io and relational benefits. but you loose all the networking and complex setup that comes with a database server. This is a great solution for an application that is ever intended to run locally and not share it's data across users or devices. Many mobile applications on android use this technology or something similar.
So why use a database?
MySQL, MariaDB, Mongo and MSSQL are all database servers which allow you to persist data across instances of your application and across users and devices. The real benefit to a database server is when you have multiple instances of your application modifying the same data set all at the same time. It is even more necessary if you have multiple users accessing the same dataset at the same time or if multiple devices need access to the same dataset all at the same time. If none of this sounds like it is a part of your requirements, then you are free to choose an alternative to a database server such as SQLite.
CodePudding user response:
You don't need to.
You can use SQLite that runs in the same process as your app without the need of a server.
Or... you can simply install MySQL locally in your machine.
Also, if you use Java, there are embedded databases such as H2, Derby, and HyperSQL that behave like a typical database, but actually store all data in local files, in the folder of your choosing.
H2 even has a "MySQL emulation mode" where it implements the common SQL behavior of MySQL.
Of course, having a database running in a separate server allows multiple users to connect to the same database to perform separate tasks. However, sometimes you only need a single user access, and you can avoid the use (and payment) of a server.
CodePudding user response:
All RDBMS have a concept of an engine, that is the component that actually interprets your queries and fetches the results. Without this engine your database is nothing more than storage. This engine is universally referred to as the server and it does not necessarily need to be executed on dedicated hardware, though it is often advantageous to do so. Most RDBMS also have a version of their engine that can be run locally so in essence you always need this engine if you want to query your data, even if it is in the form of a local library or executable that is running dedicated to your process.
- The implementation of a server component, even if you choose to run that server locally is simply an abstraction so that the management and access to the database is decoupled from the way your program can interact with it.
You can use a text file for your database, you don't have to use SQL at all, you could serialize your data and persist it in the text file, but read it into memory for the lifetime of your app. When you do this though, you are taking all of the responsibility for the storage and management of the data in your own code.
A key concept of an SQL based database is the ACID Principal and the ability for the server to marshal or queue multiple concurrent queries and still maintain the ACID principals: Atomicity, Consistency, Isolation and Durability. Most of the modern RDBMS have evolved to provide great performance but the manner and frequency of database queries is different to traditional file based access that your computer OS was designed for. Databases require a lot of memory and CPU and access to low-latency storage to operate at their greatest efficiency.
If your database is only going to be accessed from a single program, on a standalone device, then you might easily get away with using a local variant of your RDBMS of choice. I have written many applications using the LocalDB and Express versions of MS SQL Server. But in our connected world many of our applications can benefit from multiple devices sharing the same database. When this happens you either need to dedicate one of the devices to be the server, and may need to be left powered on perpetually, or you could run the server on dedicated hardware, or in the cloud.
Another consideration for the enterprise grade variants of database engines is that to achieve greater performance they need to operate closer to the metal, they need greater security privileges and lower level access to the file system than many standard workstation operators can grant. This causes all sorts of dilemmas, including who can install and operate the database. You might even prevent anti-virus software from scanning database related files as the database may want exclusive access to the files, or in many cases the way they operate is very similar to the profile of common viruses and malware. This poses a significant security risk to corporate data, so often the database engine will be hosted on a dedicated server running under a specific security context where we do not actually allow common workstation users to login to the console at all. As these servers can be vulnerable, firewalls are typically deployed to harden them from attack and reduce the attack surface area.
All of this can be a lot to manage on a local workstation or device.
Databases are traditionally very IO intensive, if there is not enough capacity to load everything into memory then the filesystem must be utilized. Depending on the other requirements of your programmed logic, there may not be sufficient resources to run the type of database and achieve the performance that you desire if you run the database engine on the same device as your executing program.
If your app will never grow to a point where there are multiple access points across multiple devices, then it might be suitable to stick with a local RDBMS. But many of us have grand visions for our software so it might make more sense to host the database on resources that will either support the growth we want or can be scaled to meet the demand.
CodePudding user response:
MySQL is a client/server database. Many client applications can connect to the MySQL Server process, either locally or over a network. One way to do that is to run a service process that governs their access to the data files.
If there were no service process, and all the client applications tried to read and write the data files directly, they would frequently conflict and they would end up corrupting the data.
The alternative is for each client application to connect to the data file, lock it using file locking requests (a POSIX operating system provides this feature), and do their read or write operations. But the disadvantage of this is that only one client at a time can use the database.
Also, file-locking only works locally, so it makes it difficult for remote clients to use the database over a network. There are shared volume technologies like SMB or NFS, but these aren't fast or reliable or synchronous enough for the needs of client applications using a database.
The file-locking strategy is how an embedded database like SQLite shares access to a data file, to allow multiple clients to share the database concurrently. Read https://www.sqlite.org/lockingv3.html for a description and you will understand how it is a difficult task to coordinate multiple clients in this way. And it still only works for local clients.
If you use a software service as the single process with direct read/write access to the data files, then the code of that process is responsible for avoiding conflicts. It can support greater concurrency, because each client is typically working on different subsets of the data. It can implement buffers in RAM to help speed up access to data. It can implement ACID features. It can enforce authentication and authorization controls.
So the advantages of using a software service include:
- Better support for concurrent clients using the database (both local and remote)
- Overall better transaction throughput
- Better reliability and reduced risk of data corruption
- Better use of system resources like RAM
- More features that are not possible without using a service
MySQL used to have a feature called libmysqld
, so a client application could run the MySQL engine as a library linked directly into the application, without running a MySQL Server process. It was deprecated in version 5.7 and removed from the 8.0 product. Hardly any MySQL users used this feature, and it was a burden to keep maintaining it. It was just increasing the size of the MySQL download package for nearly no value.