Home > Software design >  How to write an ingest pipeline for elastic search to load a csv file as nested JSONs?
How to write an ingest pipeline for elastic search to load a csv file as nested JSONs?

Time:04-06

I have a csv file that has the following format:

company_id year sales buys location
3 2020 230 112 europe
3 2019 234 231 europe
2 2020 443 351 usa
2 2019 224 256 usa

and when I import it to elastic search I end up having one entry for each line. However, I would like to import it in the format below:

[
{"company_id" : 3, 
    "location" : "europe", 
    "2020" : {"sales" : 230, "buys" : 112}, 
    "2019" : {"sales" : 234, "buys" : 231}
  }, 
{"company_id" : 2, 
    "location" : "usa", 
    "2020" : {"sales" : 443, "buys" : 351},
    "2019" : {"sales" : 224, "buys" : 256}
  } 
]

Is there a way to write the ingest pipeline (processor) in order to achieve this?

Thanks in advance for your precious answers.

CodePudding user response:

At the ingest pipeline level you'll only be able to handle one document (i.e. one row) at a time, so in order to aggregate the way you want, you need to do it at the Logstash level using the aggregate filter.

if your rows are correctly sorted by location, you can use the following example from the official documentation.

One word of caution, though: if you add year as a field, your mapping will keep growing as years go by and you potentially risk mapping explosion.

  • Related