Home > other >  Entry-level problem elasticsearch is operated in the spark, can use elasticsearch Java client
Entry-level problem elasticsearch is operated in the spark, can use elasticsearch Java client

Time:09-18

Project need to use the spark to implement by pure Java code written before an ETL function, the function mainly from kafka read data, then carries on the ETL, finally saved to elasticsearch, save them to elasticsearch, there are some additional operations, such as to see the date of joining together the index exists, if there is no want to use the specified template to create an index, and then insert into elasticsearch data,
In online checked the spark integration elasticsearch some articles, basically is to use elasticsearch - hadoop this component to manipulate the es, the component seems only to query and inserted into the specified index, using templates to create an index, index of the query whether there is this kind of operation seems to have no,
So my question is whether in the spark use native elasticsearch Java client to operate elasticsearch (such as Java High Level REST client, TransportClient)? If you can't use what reason be? If there is anything that can be used need to be aware of?
Similarly, there are some other third party call, such as database access, restful API access, whether can be out of the spark, use native Java API to call?

CodePudding user response:

Can you foreachPartition class operator, and then within the operator, using javaClient traversal data query, suggestion is to use mapPartition filter, and then use elasticsearch - hadoop provides methods to write, the performance will be better

CodePudding user response:

reference 1st floor link0007 response:
, can you foreachPartition class operator, and then within the operator, using javaClient traversal data query, suggestion is to use mapPartition filter, and then use elasticsearch - hadoop provides methods to write, better performance will


I was using a structured Streaming, foreachPartition seemingly can't use, can use mappartition, structured Streaming writeStream foreach is handled according to the partition, effect and foreachPartition should be the same?

CodePudding user response:

refer to the second floor insiderys response:
Quote: refer to 1st floor link0007 response:

Can you foreachPartition class operator, and then within the operator, using javaClient traversal data query, suggestion is to use mapPartition filter, and then use elasticsearch - hadoop provides methods to write, better performance will


I was using a structured Streaming, foreachPartition seemingly can't use, can use mappartition, structured Streaming writeStream foreach is handled according to the partition, effect and foreachPartition should be the same?


Cannot use foreachPartition, then you will be by to create JavaClient singleton factory, and make the subsequent operations,
  • Related