Home > Mobile >  General approach to search into large CSV files in Ruby, at a reasonable time & affordable cost
General approach to search into large CSV files in Ruby, at a reasonable time & affordable cost

Time:11-16

I found some answers on StackOverflow, but nothing fits exactly my needs.

I am writing a Ruby script to find rows by a specific key into large CSV files (~500MB and 1M records each file).

The grep command is taking from 15-30 minutes find a match in 1 file.

I have 400 files, and I have to run dozens of searches daily.

I need a simple, flexible and affordable solution to search in files.

  • I don't want to upload the CSVs to a robust database engine.
  • I don't want to pay for services like Elastic-Search.
  • I need to adapt to different columns-configuration and different keys periodically, with minimum effort.
  • I need read-only access to the files. Modifications and deletions are not required. So, indexes are built once and won't require further modifications.

CodePudding user response:

I finally spent 1 day of work and developed this solution: CSV-Indexer.

CSV-Indexer is not as robust as Lucene, but it is simple and cost-effective. May index files with millions of rows and find specific rows in matter of seconds.

Find full documentation and examples here:

https://github.com/leandrosardi/csv-indexer

  • Related