How to write an ingest pipeline for elastic search to load a csv file as nested JSONs?-CodePudding

I have a csv file that has the following format:

company_id	year	sales	buys	location
3	2020	230	112	europe
3	2019	234	231	europe
2	2020	443	351	usa
2	2019	224	256	usa

and when I import it to elastic search I end up having one entry for each line. However, I would like to import it in the format below:

[
{"company_id" : 3, 
    "location" : "europe", 
    "2020" : {"sales" : 230, "buys" : 112}, 
    "2019" : {"sales" : 234, "buys" : 231}
  }, 
{"company_id" : 2, 
    "location" : "usa", 
    "2020" : {"sales" : 443, "buys" : 351},
    "2019" : {"sales" : 224, "buys" : 256}
  } 
]

Is there a way to write the ingest pipeline (processor) in order to achieve this?

Thanks in advance for your precious answers.

CodePudding user response：

At the ingest pipeline level you'll only be able to handle one document (i.e. one row) at a time, so in order to aggregate the way you want, you need to do it at the Logstash level using the aggregate filter.

if your rows are correctly sorted by location, you can use the following example from the official documentation.

One word of caution, though: if you add year as a field, your mapping will keep growing as years go by and you potentially risk mapping explosion.