Home > Net >  Create new cols by comparing values from old cols
Create new cols by comparing values from old cols

Time:06-18

Input dataframe:

Item L W H
I1 3 5 8
I2 2 1 2
I3 6 9 1
I4 7 3 4

The output dataframe should be as below. Create 3 new columns: L_n, W_n, H_n by checking the values from L, W, H cols. L_n is the longest dimension, W_n is the medium and H_n is the shortest dimension.

Item L W H L_n W_n H_n
I1 3 5 8 8 5 3
I2 2 1 2 2 2 1
I3 6 9 1 9 6 1
I4 7 3 4 7 4 3

CodePudding user response:

I suggest creating an array (array), sorting it (array_sort) and selecting elements one-by-one (element_at).

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [('I1', 3, 5, 8),
     ('I2', 2, 1, 2),
     ('I3', 6, 9, 1),
     ('I4', 7, 3, 4)],
    ['Item', 'L', 'W', 'H']
)
arr = F.array_sort(F.array('L', 'W', 'H'))
df = df.select(
    '*',
    F.element_at(arr, 3).alias('L_n'),
    F.element_at(arr, 2).alias('W_n'),
    F.element_at(arr, 1).alias('H_n'),
)
df.show()
#  ---- --- --- --- --- --- --- 
# |Item|  L|  W|  H|L_n|W_n|H_n|
#  ---- --- --- --- --- --- --- 
# |  I1|  3|  5|  8|  8|  5|  3|
# |  I2|  2|  1|  2|  2|  2|  1|
# |  I3|  6|  9|  1|  9|  6|  1|
# |  I4|  7|  3|  4|  7|  4|  3|
#  ---- --- --- --- --- --- --- 
  • Related