Home > front end >  inline() vs inline_outer() in Spark SQL, Built-in Functions
inline() vs inline_outer() in Spark SQL, Built-in Functions

Time:01-17

I found inline() and inline_outer() in Spark SQL, Built-in Functions.The official examples of these two functions are the same. I can't tell the difference between them. This is why I am really confused about which one I should choose. Can I use both of them in any case?

Thank you in advance.

CodePudding user response:

In Spark code i found this in comment:

def inline_outer(col: "ColumnOrName") -> Column: """ Explodes an array of structs into a table. Unlike inline, if the array is null or empty then null is produced for each nested column.

And test for this code:

def test_inline(self):
        from pyspark.sql.functions import inline, inline_outer

        d = [
            Row(structlist=[Row(b=1, c=2), Row(b=3, c=4)]),
            Row(structlist=[Row(b=None, c=5), None]),
            Row(structlist=[]),
        ]
        data = self.spark.createDataFrame(d)

        result = [tuple(x) for x in data.select(inline(data.structlist)).collect()]
        self.assertEqual(result, [(1, 2), (3, 4), (None, 5), (None, None)])

        result = [tuple(x) for x in data.select(inline_outer(data.structlist)).collect()]
        self.assertEqual(result, [(1, 2), (3, 4), (None, 5), (None, None), (None, None)])

As you can se inline_outer produced one extra row for empty entry

  • Related