I have a csv file that has the following format:
company_id | year | sales | buys | location |
---|---|---|---|---|
3 | 2020 | 230 | 112 | europe |
3 | 2019 | 234 | 231 | europe |
2 | 2020 | 443 | 351 | usa |
2 | 2019 | 224 | 256 | usa |
and when I import it to elastic search I end up having one entry for each line. However, I would like to import it in the format below:
[
{"company_id" : 3,
"location" : "europe",
"2020" : {"sales" : 230, "buys" : 112},
"2019" : {"sales" : 234, "buys" : 231}
},
{"company_id" : 2,
"location" : "usa",
"2020" : {"sales" : 443, "buys" : 351},
"2019" : {"sales" : 224, "buys" : 256}
}
]
Is there a way to write the ingest pipeline (processor) in order to achieve this?
Thanks in advance for your precious answers.
CodePudding user response:
At the ingest pipeline level you'll only be able to handle one document (i.e. one row) at a time, so in order to aggregate the way you want, you need to do it at the Logstash level using the aggregate
filter.
if your rows are correctly sorted by location, you can use the following example from the official documentation.
One word of caution, though: if you add year
as a field, your mapping will keep growing as years go by and you potentially risk mapping explosion.