Home > OS >  Getting NULL values only from get_json_object in PySpark
Getting NULL values only from get_json_object in PySpark

Time:11-17

I have a Spark Dataframe (in Palantir Foundry) with the column "c_temperature". This column contains a JSON string in each row with the following schema:

{"TempCelsiusEndAvg":"24.33","TempCelsiusEndMax":"null","TempCelsiusEndMin":"null","TempCelsiusStartAvg":"22.54","TempCelsiusStartMax":"null","TempCelsiusStartMin":"null","TempEndPlausibility":"T_PLAUSIBLE","TempStartPlausibility":"T_PLAUSIBLE"}

I tried to extract the values (they are sometimes "null" and sometimes wiht values like e.g. "24.33") of the avg temperatures in the new columns "TempCelsiusEndAvg" and "TempCelsiusStartAvg" with the following code:

from pyspark.sql import functions as F
from pyspark.sql.types import StringType

def flat_json(sessions_finished):
    df = sessions_finished
    df = df.withColumn("new_temperature", F.col('c_temperature').cast(StringType())
    df = df.withColumn("TempCelsiusEndAvg", F.get_json_object("c_Temperature", '$.TempCelsiusEndAvg'))
    df = df.withColumn("TempCelsiusStartAvg", F.get_json_object("c_Temperature", '$.TempCelsiusStartAvg'))
    return df

I wanted to get the new columns filled with doubles like:

...  ----------------- -------------------  ...
... |TempCelsiusEndAvg|TempCelsiusStartAvg| ...
...  ----------------- -------------------  ...
... |            24.33|              22.54| ...
...  ----------------- -------------------  ...
... |            29.28|              25.16| ...
...  ----------------- -------------------  ...
... |             null|               null| ...
...  ----------------- -------------------  ...

The new dataframe contains the columns but they are only filled with null values. Can anyone help me solving this problem?

...  ----------------- -------------------  ...
... |TempCelsiusEndAvg|TempCelsiusStartAvg| ...
...  ----------------- -------------------  ...
... |             null|               null| ...
...  ----------------- -------------------  ...
... |             null|               null| ...
...  ----------------- -------------------  ...
... |             null|               null| ...
...  ----------------- -------------------  ...

There is also a comment in this thread: [https://stackoverflow.com/questions/46084158/how-can-you-parse-a-string-that-is-json-from-an-existing-temp-table-using-pyspar] that describes my problem, but I have no idea how to use this information.

CodePudding user response:

You are don't need to do anything, since the column is already a struct. You can create those columns by accessing them with a .

 df = df.withColumn("TempCelsiusEndAvg", F.col("c_temperature.TempCelsiusEndAvg"))
 df = df.withColumn("TempCelsiusStartAvg", F.col("c_temperature.TempCelsiusStartAvg"))
  • Related