I found some answers on StackOverflow, but nothing fits exactly my needs.
I am writing a Ruby script to find rows by a specific key into large CSV files (~500MB and 1M records each file).
The grep
command is taking from 15-30 minutes find a match in 1 file.
I have 400 files, and I have to run dozens of searches daily.
I need a simple, flexible and affordable solution to search in files.
- I don't want to upload the CSVs to a robust database engine.
- I don't want to pay for services like Elastic-Search.
- I need to adapt to different columns-configuration and different keys periodically, with minimum effort.
- I need read-only access to the files. Modifications and deletions are not required. So, indexes are built once and won't require further modifications.
CodePudding user response:
I finally spent 1 day of work and developed this solution: CSV-Indexer.
CSV-Indexer is not as robust as Lucene, but it is simple and cost-effective. May index files with millions of rows and find specific rows in matter of seconds.
Find full documentation and examples here: