Home > other >  pyspark- elif statement and assign position to extract a value
pyspark- elif statement and assign position to extract a value

Time:09-21

I have a dataframe like the example

productNo prodcuctMT productPR productList
2389 ['xy-5', 'yz-12','zb-56','iu-30'] ['pr-1', 'pr-2', 'pr-3', 'pr-4'] ['67230','7839','1339','9793']
6745 ['xy-4', 'yz-34','zb-8','iu-9'] ['pr-6', pr-1', 'pr-3', 'pr-7'] ['1111','0987','8910','0348']

I would like to use elif statement for multiple conditions where we look at productMT and if it passes the condition it looks at productPR and takes the position for which it satisfies the condition.

if productMT contains xy-5 then if productPR contains pr-1 , take its position and add new column with value from productList.

productNo prodcuctMT productPR productList productList
2389 ['xy-5', 'yz-12','zb-56','iu-30'] ['pr-1', 'pr-2', 'pr-3', 'pr-4'] ['67230','7839','1339','9793'] 67230

I tried using a filter but it only does the work for one filter and I need to run on multiple filters so it loops through all rows and conditions.

    F.arrays_zip('productList', 'prodcuctMT', 'productPR'),
    lambda x: (x.prodcuctMT == 'xy-5') & (x.productPR != 'pr-1')
)
df_array_pos = df_array.withColumn('output', filtered[0].productList).withColumn('flag',  filtered[0].prodcuctMT)```

CodePudding user response:

You just need to use multiple when functions for each elif conditions you want

Your sample data

df = spark.createDataFrame([
    (2389, ['xy-5', 'yz-12','zb-56','iu-30'], ['pr-1', 'pr-2', 'pr-3', 'pr-4'], ['67230','7839','1339','9793']),
    (6745, ['xy-4', 'yz-34','zb-8','iu-9'], ['pr-6', 'pr-1', 'pr-3', 'pr-7'], ['1111','0987','8910','0348']),
], ['productNo', 'productMT', 'productPR', 'productList'])

 --------- --------------------------- ------------------------ ------------------------- 
|productNo|productMT                  |productPR               |productList              |
 --------- --------------------------- ------------------------ ------------------------- 
|2389     |[xy-5, yz-12, zb-56, iu-30]|[pr-1, pr-2, pr-3, pr-4]|[67230, 7839, 1339, 9793]|
|6745     |[xy-4, yz-34, zb-8, iu-9]  |[pr-6, pr-1, pr-3, pr-7]|[1111, 0987, 8910, 0348] |
 --------- --------------------------- ------------------------ ------------------------- 

You can add as many when as you like

from pyspark.sql import functions as F

(df
    .withColumn('output', F
        .when(F.array_contains('productMT', 'xy-5') & F.array_contains('productPR', 'pr-1'), F.col('productList')[F.array_position('productMT', 'xy-5') - 1])
    )
    .show(10, False)
)

 --------- --------------------------- ------------------------ ------------------------- ------ 
|productNo|productMT                  |productPR               |productList              |output|
 --------- --------------------------- ------------------------ ------------------------- ------ 
|2389     |[xy-5, yz-12, zb-56, iu-30]|[pr-1, pr-2, pr-3, pr-4]|[67230, 7839, 1339, 9793]|67230 |
|6745     |[xy-4, yz-34, zb-8, iu-9]  |[pr-6, pr-1, pr-3, pr-7]|[1111, 0987, 8910, 0348] |null  |
 --------- --------------------------- ------------------------ ------------------------- ------ 
  • Related