I found inline() and inline_outer() in Spark SQL, Built-in Functions.The official examples of these two functions are the same. I can't tell the difference between them. This is why I am really confused about which one I should choose. Can I use both of them in any case?
Thank you in advance.
CodePudding user response:
In Spark code i found this in comment:
def inline_outer(col: "ColumnOrName") -> Column: """ Explodes an array of structs into a table. Unlike inline, if the array is null or empty then null is produced for each nested column.
And test for this code:
def test_inline(self):
from pyspark.sql.functions import inline, inline_outer
d = [
Row(structlist=[Row(b=1, c=2), Row(b=3, c=4)]),
Row(structlist=[Row(b=None, c=5), None]),
Row(structlist=[]),
]
data = self.spark.createDataFrame(d)
result = [tuple(x) for x in data.select(inline(data.structlist)).collect()]
self.assertEqual(result, [(1, 2), (3, 4), (None, 5), (None, None)])
result = [tuple(x) for x in data.select(inline_outer(data.structlist)).collect()]
self.assertEqual(result, [(1, 2), (3, 4), (None, 5), (None, None), (None, None)])
As you can se inline_outer produced one extra row for empty entry