I have a file look like this:
'{"Name": "John", "Age": 23}'
'{"Name": "Mary", "Age": 21}'
How can I read this file and get a pyspark dataframe like this:
Name | Age
"John" | 23
"Mary" | 21
CodePudding user response:
First read in the file in text
format, and then use the from_json
function to convert the row to two columns.
df = spark.read.load(path_to_your_file, format='text')
df = df.selectExpr("from_json(trim('\\'' from value), 'Name string,Age int') as data").select('data.*')
df.show(truncate=False)