read a file of JSON string in pyspark-CodePudding

I have a file look like this:

'{"Name": "John", "Age": 23}'
'{"Name": "Mary", "Age": 21}'

How can I read this file and get a pyspark dataframe like this:

     Name   | Age
    "John"  | 23
    "Mary"  | 21

CodePudding user response：

First read in the file in text format, and then use the from_json function to convert the row to two columns.

df = spark.read.load(path_to_your_file, format='text')
df = df.selectExpr("from_json(trim('\\'' from value), 'Name string,Age int') as data").select('data.*')
df.show(truncate=False)