Inverse a column in pyspark-CodePudding

 a   b 
'1'  1
'2'  2
'3'  3
'4'  4

I would like to insert a new column which is the inverse of the b column while keeping the other columns constant. Example:

 a   b  c
'1'  1  4
'2'  2  3
'3'  3  2 
'4'  4  1

We use temp['b'][::-1] to achieve this result in pandas. Is this transformation possible in pyspark as well?

CodePudding user response：

Let's say your dataframe is ordered by column a.

You could try performing a self-join on a generated column that reverses the order. Such a column, rn, could be generated using row_number eg

Using pyspark api

from pyspark.sql import functions as F
from pyspark.sql import Window

output_df = (
    df.withColumn(
        "rn",
        F.row_number().over(Window.orderBy("a"))
    )
    .alias("df1")
    .join(
        df.withColumn(
            "rn",
            F.row_number().over(Window.orderBy(F.col("a").desc()))
        ).alias("df2"),
        ["rn"],
        "inner"
    )
    .selectExpr("df1.a","df1.b","df2.b as c")
    
)

Using spark sql

    select
        df1.a,
        df1.b,
        df2.b as c
    from (
        select
            *,
            row_number() over (order by a) rn
        from 
            df
    ) df1
    INNER JOIN (
        select
            b,
            row_number() over (order by a desc) rn
        from 
            df
    ) df2 on df1.rn=df2.rn;

a	b	c
1	1	4
2	2	3
3	3	2
4	4	1

View on DB Fiddle