Home > Software engineering >  How to see a particular metric in Spark Structured Streaming with Python
How to see a particular metric in Spark Structured Streaming with Python

Time:12-19

I'm very new to Spark and Python. I'm trying to see any metric in Spark Structured Streaming (for example, processedRowsPerSecond), but I don't know how to do it.

I've read in "Structured Streaming Programming Guide" that with print(query.lastProgress) you can directly get the current status and metrics of an active query, but if I write it I only obtain None once. The last part of my code is the following:

query = windowedCountsDF\
    .writeStream\
    .outputMode('update')\
    .option("truncate", "false") \
    .format('console') \
    .queryName("numbers") \
    .start()

print(query.lastProgress)

query.awaitTermination()

Any idea on how to do it will be highly appreciated.

CodePudding user response:

Try with:

while query.isActive:
    print("\n")
    print(query.status)
    print(query.lastProgress)
    time.sleep(30)

query.awaitTermination() blocks query.lastProgress.

CodePudding user response:

it really depends on what do you want to do with that metric. Your problem is that you're calling query.awaitTermination(), and it blocks any other activity. If you want to collect metrics, then instead of calling query.awaitTermination(), you need to implement your own wait loop, something like this:

query = ...

while not query.exception():
  if query.lastProgress:
    print(query.lastProgress) # do something with your data
  time.sleep(10) # wait 10 seconds..
  • Related