Lets suppose I have such dataframe org
below:
id |raw
123|{"inn":"123”, "prof": "tkie"}
I need to extract from json from column raw
all ids
values to new columns, how to do that?
I tried:
org.withColumn('inn', from_json($"raw", MapType(StringType, StringType))).withColumn('inn', col('searchcard'.getItem('inn')))
got error:
File "", line 1 org.withColumn('inn', from_json($"raw", MapType(StringType, StringType))).withColumn('inn', col('searchcard'.getItem('inn'))) ^ SyntaxError: invalid syntax
And:
org.withColumn('inn', from_json("raw", MapType(StringType, StringType))).withColumn('inn', col('searchcard'.getItem('inn')))
error:
AssertionError Traceback (most recent call last) in () ----> 1 org.withColumn('inn', from_json("raw", MapType(StringType, StringType))).withColumn('inn', col('searchcard'.getItem('inn')))
/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/types.py in init(self, keyType, valueType, valueContainsNull) 342 False 343 """ --> 344 assert isinstance(keyType, DataType), "keyType should be DataType" 345 assert isinstance(valueType, DataType), "valueType should be DataType" 346 self.keyType = keyType AssertionError: keyType should be DataType
CodePudding user response:
This method get_json_object did the trick!
org.withColumn('inn',get_json_object(org.raw, '$.inn')).show(1)
CodePudding user response:
You code contains several problems:
$"raw"
is used with Scala API, in Pyspark usecol("raw")
or directly as string"raw"
- When using
StringType
or any other type, in python you need to add parenthesesStringType()
getItem
is a Column method but your calling it in string ('searchcard'.getItem('inn')
)
Here's the same code corrected with complete working example using from_json
function:
import pyspark.sql.functions as F
from pyspark.sql.types import StringType, MapType
org = spark.createDataFrame([
(123, '{"inn":"123", "prof": "tkie"}')
], ["id", "raw"])
org.withColumn(
'raw',
F.from_json("raw", MapType(StringType(), StringType()))
).select(
'id',
F.col('raw').getItem('inn').alias('inn')
).show()
# --- ---
#| id|inn|
# --- ---
#|123|123|
# --- ---