a b
'1' 1
'2' 2
'3' 3
'4' 4
I would like to insert a new column which is the inverse of the b column while keeping the other columns constant. Example:
a b c
'1' 1 4
'2' 2 3
'3' 3 2
'4' 4 1
We use temp['b'][::-1] to achieve this result in pandas. Is this transformation possible in pyspark as well?
CodePudding user response:
Let's say your dataframe is ordered by column a
.
You could try performing a self-join on a generated column that reverses the order. Such a column, rn
, could be generated using row_number
eg
Using pyspark api
from pyspark.sql import functions as F
from pyspark.sql import Window
output_df = (
df.withColumn(
"rn",
F.row_number().over(Window.orderBy("a"))
)
.alias("df1")
.join(
df.withColumn(
"rn",
F.row_number().over(Window.orderBy(F.col("a").desc()))
).alias("df2"),
["rn"],
"inner"
)
.selectExpr("df1.a","df1.b","df2.b as c")
)
Using spark sql
select
df1.a,
df1.b,
df2.b as c
from (
select
*,
row_number() over (order by a) rn
from
df
) df1
INNER JOIN (
select
b,
row_number() over (order by a desc) rn
from
df
) df2 on df1.rn=df2.rn;
a | b | c |
---|---|---|
1 | 1 | 4 |
2 | 2 | 3 |
3 | 3 | 2 |
4 | 4 | 1 |