I'm very new to Spark and Python. I'm trying to see any metric in Spark Structured Streaming (for example, processedRowsPerSecond
), but I don't know how to do it.
I've read in "Structured Streaming Programming Guide" that with print(query.lastProgress)
you can directly get the current status and metrics of an active query, but if I write it I only obtain None
once. The last part of my code is the following:
query = windowedCountsDF\
.writeStream\
.outputMode('update')\
.option("truncate", "false") \
.format('console') \
.queryName("numbers") \
.start()
print(query.lastProgress)
query.awaitTermination()
Any idea on how to do it will be highly appreciated.
CodePudding user response:
Try with:
while query.isActive:
print("\n")
print(query.status)
print(query.lastProgress)
time.sleep(30)
query.awaitTermination()
blocks query.lastProgress
.
CodePudding user response:
it really depends on what do you want to do with that metric. Your problem is that you're calling query.awaitTermination()
, and it blocks any other activity. If you want to collect metrics, then instead of calling query.awaitTermination()
, you need to implement your own wait loop, something like this:
query = ...
while not query.exception():
if query.lastProgress:
print(query.lastProgress) # do something with your data
time.sleep(10) # wait 10 seconds..