I need to update MongoDB database from multiple files.
Pre-conditions:
Environment - AWS
File storage - AWS S3
Database - MongoDB
Number of files - 100-500 (approximately)
Total amount of data - 20-30 megabytes
An important requirement is that data must be validated before being saved to the database.
Validation can be as simple as checking if a field exists. Or more complex - if there are relationship between entities, then you need to check all the entities that can be in different files.
The number of files and the amount of data does not seem very large to me. A possible rough solution would be to load all the data into memory, perform validation, and then store it in the database. There are no memory or performance limits.
Perhaps one of you has solved a similar problem?
CodePudding user response:
Whether you load this into memory, or use database staging tables that sounds like a sensible approach as you can then process this in a single transaction. Personally I'd go down the staging tables route as it's highly likely that this would be easier and more decoupled to build as I'd guess the different files have some form of unique identifier in the filename to help distinguish what type of file it is and hence the data structure etc.
Get it into database staging tables then you can work on the data properly and significantly easier before you write the data to the real tables.