Read Json Kafka message without schema-CodePudding

Currently we are working on a real time data feeds having Json data.

While reading the examples from - https://sparkbyexamples.com/spark/spark-streaming-with-kafka/

It looks like we need a schema for kafka json message.

Is there any other way to process data without schema ?

CodePudding user response：

try below code after running the zookeeper, Kafka server and other required service.

df = spark \
        .readStream \
        .format("kafka") \
        .option("kafka.bootstrap.servers", kafka_bootstrap_servers) \
        .option("subscribe", kafka_topic_name) \
        .option("startingOffsets", "latest")\
        .load()  #earliest

print("Printing Schema of transaction_detail_df: ")
df.printSchema()

transaction_detail_df1 = df.selectExpr("CAST(value AS STRING)")

trans_detail_write_stream = transaction_detail_df1 \
    .writeStream \
    .trigger(processingTime='2 seconds') \
    .option("truncate", "false") \
    .format("console") \
    .start()

trans_detail_write_stream.awaitTermination()

just change the basic configuration, you would be able to see the output

CodePudding user response：

You can use get_json_object SparkSQL function to parse data out of JSON string data without defining any additional schema.

You can simply use cast function to deserialize the binary key/value, as the example shows