Home > Blockchain >  How to chain explode and struct field selection?
How to chain explode and struct field selection?


The dataframe:

from pyspark.sql import functions as F
df = spark.createDataFrame([([(1, 2), (3, 4)],)], 'col_name array<struct<c1:int,c2:int>>')

#  ---------------- 
# |        col_name|
#  ---------------- 
# |[{1, 2}, {3, 4}]|
#  ---------------- 

# root
#  |-- col_name: array (nullable = true)
#  |    |-- element: struct (containsNull = true)
#  |    |    |-- c1: integer (nullable = true)
#  |    |    |-- c2: integer (nullable = true)

I explode the array (the result is a column of type struct<c1:int,c2:int>).
And then select every struct field (but I select twice):

df = df.select(
    [f'col.{c}' for c in ('c1', 'c2')]
#  --- --- 
# | c1| c2|
#  --- --- 
# |  1|  2|
# |  3|  4|
#  --- --- 

# root
#  |-- c1: integer (nullable = true)
#  |-- c2: integer (nullable = true)

I know I can shorten the second select to just 'col.*'. But I would still have 2 selects.

Question. Is there a method to select struct fields right after the explode with only 1 select?

As the result of the explode has schema struct<c1:int,c2:int>, I thought this would work...

df = df.select(
    [F.explode('col_name')[c] for c in ('c1', 'c2')]

AnalysisException: No such struct field c1 in col

CodePudding user response:

Use the magic inline


 --- --- 
| c1| c2|
 --- --- 
|  1|  2|
|  3|  4|
 --- --- 
  • Related