Home > OS >  Convert array of elements to multiple columns
Convert array of elements to multiple columns

Time:11-01

How I can convert a array (in a column) with a set of elements in a JSON dataset to multiple columns with python, spark or pandas? The data is structured in this form:

root
 |-- items: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- idAccount: long (nullable = true)
 |    |    |-- infractionType: string (nullable = true)
 |    |    |-- responseTime: string (nullable = true)
 |    |    |-- status: string (nullable = true)
 |    |    |-- transactionCode: string (nullable = true)

I'm expecting some kind of this:

id idAccount
value value
value value

CodePudding user response:

In Spark SQL, you can access the item in ArrayType or MapType column by using getItem. For example, you want to get the value of the id of first item, you can use df.select(func.getItem(0).getItem('id'))

CodePudding user response:

an array of structs can be exploded into columns using the inline sql function.

here's an example of how it works.

data_sdf = spark.createDataFrame([([(1234, 2345, 3456), (4321, 5432, 6543)],)], 
                                 'items array<struct<id: int, id_acc: int, foo: int>>'
                                 )

#  ---------------------------------------- 
# |items                                   |
#  ---------------------------------------- 
# |[{1234, 2345, 3456}, {4321, 5432, 6543}]|
#  ---------------------------------------- 

# root
#  |-- items: array (nullable = true)
#  |    |-- element: struct (containsNull = true)
#  |    |    |-- id: integer (nullable = true)
#  |    |    |-- id_acc: integer (nullable = true)
#  |    |    |-- foo: integer (nullable = true)

# explode and create new columns using struct fields - using `inline`
data_sdf. \
    selectExpr('inline(items)'). \
    show()

#  ---- ------ ---- 
# |  id|id_acc| foo|
#  ---- ------ ---- 
# |1234|  2345|3456|
# |4321|  5432|6543|
#  ---- ------ ---- 

you can further just select() the required fields after the explosion.

  • Related