SparkStreaming program has two data sources were from two different topic, two by id to join in the topic, but the phenomenon of data may be not in the same batch, the second topic than the first number to early or late or not at all a piece, how to solve,
CodePudding user response:
Use redis cache, will not match the data in the first topic to redis, then each batch to read redis, if match to will delete this data,
CodePudding user response:
This scene made offline processing is more appropriate, the data to be born first